Open NirvanaCh opened 6 months ago
ranking database downloaded
$ cat hg38_screen_v10_clust.regions_vs_motifs.rankings.feather.sha1sum.txt
1688a925f22d312769798258d990f13866bb4924 hg38_screen_v10_clust.regions_vs_motifs.rankings.feather
$ head hg38_screen_v10_clust.regions_vs_motifs.rankings.feather.zsync
Blocksize: 2048
Filename: hg38_screen_v10_clust.regions_vs_motifs.rankings.feather
Hash-Lengths: 2,3,6
Length: 35192956928
MTime: Thu, 07 Jul 2022 14:35:59 +0000
SHA-1: 95c823ee1e19f68ce0c82f79042cdc1007018ddb
URL: https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/screen/mc_v10_clust/region_based/hg38_screen_v10_clust.regions_vs_motifs.rankings.feather
zsync: 2.0.0-alpha-1
�W�inX�H1�ƤM)�s3���␦
�.�t�4��eDb�D��>�P�_�����C�е�C�G�o�e����t=�r��?i����i���X{�^�O#�5�L��څq�Kr��D�!S9�ۢ�I}����w� �{3�U^�u��3L���������D4��.>5c)�4a�B��r�ZD�C��_����˃����a�"��2#v/��[D�Z���,�
$ sha1sum hg38_screen_v10_clust.regions_vs_motifs.rankings.feather
95c823ee1e19f68ce0c82f79042cdc1007018ddb hg38_screen_v10_clust.regions_vs_motifs.rankings.feather
An error occurred :
ValueError: "/m/tutor/database/hg38_screen_v10_clust.regions_vs_motifs.rankings.feather" is not a cisTarget Feather database in Feather v1 or v2 format.
ctxcore/ctdb.py :
......
def is_feather_v1_or_v2(feather_filename: Union[Path, str]) -> Optional[int]:
"""
Check if the passed filename is a Feather v1 or v2 file.
:param feather_filename: Feather v1 or v2 filename.
:return: 1 (for Feather version 1), 2 (for Feather version 2) or None.
"""
with open(feather_filename, "rb") as fh_feather:
# Read first 6 and last 6 bytes to see if we have a Feather v2 file.
fh_feather.seek(0, 0)
feather_v2_magic_bytes_header = fh_feather.read(6)
fh_feather.seek(-6, 2)
feather_v2_magic_bytes_footer = fh_feather.read(6)
if feather_v2_magic_bytes_header == feather_v2_magic_bytes_footer == b"ARROW1":
# Feather v2 file.
return 2
# Read first 4 and last 4 bytes to see if we have a Feather v1 file.
feather_v1_magic_bytes_header = feather_v2_magic_bytes_header[0:4]
feather_v1_magic_bytes_footer = feather_v2_magic_bytes_footer[2:]
if feather_v1_magic_bytes_header == feather_v1_magic_bytes_footer == b"FEA1":
# Feather v1 file.
return 1
# Some other file format.
return None
......
$ head -c 6 hg38_screen_v10_clust.regions_vs_motifs.*
==> hg38_screen_v10_clust.regions_vs_motifs.rankings.feather <==
ARROW1
==> hg38_screen_v10_clust.regions_vs_motifs.scores.feather <==
ARROW1
$ tail -c 6 hg38_screen_v10_clust.regions_vs_motifs.*
==> hg38_screen_v10_clust.regions_vs_motifs.rankings.feather <==
��
==> hg38_screen_v10_clust.regions_vs_motifs.scores.feather <==
00176-
The file size is incorrect.
$ stat hg38_screen_v10_clust.regions_vs_motifs.*.feather
File: hg38_screen_v10_clust.regions_vs_motifs.rankings.feather
Size: 35192956928 Blocks: 68736272 IO Block: 4096 regular file
Device: 807h/2055d Inode: 18643438 Links: 1
Access: (0777/-rwxrwxrwx) Uid: ( 1001/ charles) Gid: ( 1001/ charles)
Access: 2024-05-09 10:10:29.183467890 +0800
Modify: 2022-07-07 14:35:59.000000000 +0800
Change: 2024-05-09 10:10:03.311709805 +0800
Birth: 2024-05-08 21:57:40.146629410 +0800
File: hg38_screen_v10_clust.regions_vs_motifs.scores.feather
Size: 13882267648 Blocks: 27113824 IO Block: 4096 regular file
Device: 807h/2055d Inode: 18643440 Links: 1
Access: (0777/-rwxrwxrwx) Uid: ( 1001/ charles) Gid: ( 1001/ charles)
Access: 2024-05-09 10:48:38.146833263 +0800
Modify: 2024-05-08 23:28:39.283831255 +0800
Change: 2024-05-09 10:10:03.311709805 +0800
Birth: 2024-05-08 21:57:43.862621727 +0800
$ curl -I https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/screen/mc_v10_clust/region_based/hg38_screen_v10_clust.regions_vs_motifs.rankings.feather
HTTP/1.1 200 OK
Date: Thu, 09 May 2024 03:52:22 GMT
Server: Apache/2.4.29 (Ubuntu)
Strict-Transport-Security: max-age=15768000
Last-Modified: Thu, 07 Jul 2022 14:35:59 GMT
ETag: "831a9eca2-5e338010f31c0"
Accept-Ranges: bytes
Content-Length: 35192958114
X-Frame-Options: sameorigin
$ curl -I https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/screen/mc_v10_clust/region_based/hg38_screen_v10_clust.regions_vs_motifs.scores.feather
HTTP/1.1 200 OK
Date: Thu, 09 May 2024 03:56:51 GMT
Server: Apache/2.4.29 (Ubuntu)
Strict-Transport-Security: max-age=15768000
Last-Modified: Thu, 07 Jul 2022 14:31:02 GMT
ETag: "33b729822-5e337ef5b5580"
Accept-Ranges: bytes
Content-Length: 13882267682
X-Frame-Options: sameorigin
So the ’zsync‘ files is incorrect.
I fixed it using ‘curl -C -’
$ curl -C - -O https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/screen/mc_v10_clust/region_based/hg38_screen_v10_clust.regions_vs_motifs.rankings.feather** Resuming transfer from byte position 35192956928
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 0 0 0 0 0 0 0 --:--:-- 0:00:01 --:--:--100 1186 100 1186 0 0 989 0 0:00:01 0:00:01 --:--:-- 989
$ curl -C - -O https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/screen/mc_v10_clust/region_based/hg38_screen_v10_clust.regions_vs_motifs.scores.feather
** Resuming transfer from byte position 13882267648
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:--100 34 100 34 0 0 29 0 0:00:01 0:00:01 --:--:-- 29
$ tail -c 6 hg38_screen_v10_clust.regions_vs_motifs.*feather
==> hg38_screen_v10_clust.regions_vs_motifs.rankings.feather <==
ARROW1
==> hg38_screen_v10_clust.regions_vs_motifs.scores.feather <==
ARROW1
It looks like it's working now.
To summarize,
Best wishes
zsync files are removed for now as zsync was having issues with big files (larger than 2G) for a long time.
Looks like the zsync2 bug: https://github.com/AppImageCommunity/zsync2/issues/31 might finally be resolved in a fork of zsync2: https://github.com/NiLuJe/zsync2/commit/a8e2d68e3f03315835f6d6fb9f74a26c3ea000b9
I'm sorry for submitting an issue here. I tried to download these databases using zsync.
https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/screen/mc_v10_clust/region_based/hg38_screen_v10_clust.regions_vs_motifs.scores.feather
Pay attention to the SHA-1 checksum.
As you can see, its SHA-1 value matches the one recorded in the 'zsync' file's header, but differs from the one recorded in 'sha1sum.txt'.
I hope it's not my fault, as redownloading is a bit of a hassle.