WGLab / DeepRepeat

An accurate repeat detection from Nanopore data using deep learning and image techniques
Other
19 stars 4 forks source link

Impossible to open/read FAST5 files #5

Open GianlucaDamaggio opened 2 years ago

GianlucaDamaggio commented 2 years ago

Hi there,

I am very interested in this software but I have difficulty to use my data.

I recive this error that tells me that is impossbile to open/read the FAST5 files.

I have checked if the path for the FAST5 files is correct and it is.

I am analysing barcoded data, one hypothesis is that I am using one f5.f5index that contains all the run but the BAM was about only 200 reads on 9Million sequenced.

Any suggestions ?

Error! cannot read fast5 file= /home/xxx/xxx/xxx/fast5/FAT20294_cfb18e17_1125.fast5
For: Qry= cb7ddec0-7da2-44e9-87c8-fc9bcfbe1930:717(717)-1518 with=1728/1728 Map= +chr4:3074876-3075648 Map-flag=0 qual= cigar-len=139/139
c_rep_regions_in_whole_0 chr4:3074876-3074940:3
c_rep_regions_in_whole_1 chr4:3074938-3074968:3
c_rep_regions_in_whole_2 chr4:3075017-3075049:12
c_rep_regions_0 chr4:3074876-3074933:3
Read fast5 file =  /home/xxx/xxx/xxx/fast5/FAT20294_cfb18e17_1125.fast5
Error!!! Canot open file  /home/xxx/xxx/xxx/fast5/FAT20294_cfb18e17_1125.fast5
HDF5-DIAG: Error detected in HDF5 (1.10.1) thread 140551523776320:
  #000: H5G.c line 472 in H5Gopen2(): unable to open group
    major: Symbol table
    minor: Can't open object
  #001: H5Gint.c line 287 in H5G__open_name(): group not found
    major: Symbol table
    minor: Object not found
  #002: H5Gloc.c line 428 in H5G_loc_find(): can't find object
    major: Symbol table
    minor: Object not found
  #003: H5Gtraverse.c line 867 in H5G_traverse(): internal path traversal failed
    major: Symbol table
    minor: Object not found
  #004: H5Gtraverse.c line 639 in H5G_traverse_real(): traversal operator failed
    major: Symbol table
    minor: Callback failed
  #005: H5Gloc.c line 383 in H5G_loc_find_cb(): object 'channel_id' doesn't exist
    major: Symbol table
    minor: Object not found
liuqianhn commented 2 years ago

@GianlucaDamaggio please kindly share what you will get from h5ls -r /home/xxx/xxx/xxx/fast5/FAT20294_cfb18e17_1125.fast5 | head -n 50. By the way, how do you basecall the data?

GianlucaDamaggio commented 2 years ago

Thanks for your reply,

The data were basecalled with guppy v5.0.11.

Here is the output of "h5ls -r"

/ Group /read_001664c4-4059-4c83-9e32-89c0774fbc74 Group /read_001664c4-4059-4c83-9e32-89c0774fbc74/Raw Group /read_001664c4-4059-4c83-9e32-89c0774fbc74/Raw/Signal Dataset {12742/Inf} /read_001664c4-4059-4c83-9e32-89c0774fbc74/channel_id Group /read_001664c4-4059-4c83-9e32-89c0774fbc74/context_tags Group /read_001664c4-4059-4c83-9e32-89c0774fbc74/tracking_id Group /read_003fb8ab-2e18-4005-9144-b29462114d6a Group /read_003fb8ab-2e18-4005-9144-b29462114d6a/Raw Group /read_003fb8ab-2e18-4005-9144-b29462114d6a/Raw/Signal Dataset {17501/Inf} /read_003fb8ab-2e18-4005-9144-b29462114d6a/channel_id Group /read_003fb8ab-2e18-4005-9144-b29462114d6a/context_tags Group, same as /read_001664c4-4059-4c83-9e32-89c0774fbc74/context_tags /read_003fb8ab-2e18-4005-9144-b29462114d6a/tracking_id Group, same as /read_001664c4-4059-4c83-9e32-89c0774fbc74/tracking_id /read_0046143b-a049-4851-b8e7-c851440fc51f Group /read_0046143b-a049-4851-b8e7-c851440fc51f/Raw Group /read_0046143b-a049-4851-b8e7-c851440fc51f/Raw/Signal Dataset {17867/Inf} /read_0046143b-a049-4851-b8e7-c851440fc51f/channel_id Group /read_0046143b-a049-4851-b8e7-c851440fc51f/context_tags Group, same as /read_001664c4-4059-4c83-9e32-89c0774fbc74/context_tags /read_0046143b-a049-4851-b8e7-c851440fc51f/tracking_id Group, same as /read_001664c4-4059-4c83-9e32-89c0774fbc74/tracking_id /read_007f9d99-28e0-4811-947e-3f444982c171 Group /read_007f9d99-28e0-4811-947e-3f444982c171/Raw Group /read_007f9d99-28e0-4811-947e-3f444982c171/Raw/Signal Dataset {19931/Inf} /read_007f9d99-28e0-4811-947e-3f444982c171/channel_id Group /read_007f9d99-28e0-4811-947e-3f444982c171/context_tags Group, same as /read_001664c4-4059-4c83-9e32-89c0774fbc74/context_tags /read_007f9d99-28e0-4811-947e-3f444982c171/tracking_id Group, same as /read_001664c4-4059-4c83-9e32-89c0774fbc74/tracking_id /read_008a7b6a-8192-4bc0-af7a-7dcbde34277a Group /read_008a7b6a-8192-4bc0-af7a-7dcbde34277a/Raw Group /read_008a7b6a-8192-4bc0-af7a-7dcbde34277a/Raw/Signal Dataset {19749/Inf} /read_008a7b6a-8192-4bc0-af7a-7dcbde34277a/channel_id Group /read_008a7b6a-8192-4bc0-af7a-7dcbde34277a/context_tags Group, same as /read_001664c4-4059-4c83-9e32-89c0774fbc74/context_tags /read_008a7b6a-8192-4bc0-af7a-7dcbde34277a/tracking_id Group, same as /read_001664c4-4059-4c83-9e32-89c0774fbc74/tracking_id /read_008e026b-b58f-4350-82aa-320431a18ab2 Group /read_008e026b-b58f-4350-82aa-320431a18ab2/Raw Group /read_008e026b-b58f-4350-82aa-320431a18ab2/Raw/Signal Dataset {19690/Inf} /read_008e026b-b58f-4350-82aa-320431a18ab2/channel_id Group /read_008e026b-b58f-4350-82aa-320431a18ab2/context_tags Group, same as /read_001664c4-4059-4c83-9e32-89c0774fbc74/context_tags /read_008e026b-b58f-4350-82aa-320431a18ab2/tracking_id Group, same as /read_001664c4-4059-4c83-9e32-89c0774fbc74/tracking_id /read_00997882-03f3-4590-9a0b-48839f56c0be Group /read_00997882-03f3-4590-9a0b-48839f56c0be/Raw Group /read_00997882-03f3-4590-9a0b-48839f56c0be/Raw/Signal Dataset {22647/Inf} /read_00997882-03f3-4590-9a0b-48839f56c0be/channel_id Group /read_00997882-03f3-4590-9a0b-48839f56c0be/context_tags Group, same as /read_001664c4-4059-4c83-9e32-89c0774fbc74/context_tags /read_00997882-03f3-4590-9a0b-48839f56c0be/tracking_id Group, same as /read_001664c4-4059-4c83-9e32-89c0774fbc74/tracking_id /read_009af276-c6ce-4681-be7f-b41ca8433ddb Group /read_009af276-c6ce-4681-be7f-b41ca8433ddb/Raw Group /read_009af276-c6ce-4681-be7f-b41ca8433ddb/Raw/Signal Dataset {17780/Inf} /read_009af276-c6ce-4681-be7f-b41ca8433ddb/channel_id Group /read_009af276-c6ce-4681-be7f-b41ca8433ddb/context_tags Group, same as /read_001664c4-4059-4c83-9e32-89c0774fbc74/context_tags /read_009af276-c6ce-4681-be7f-b41ca8433ddb/tracking_id Group, same as /read_001664c4-4059-4c83-9e32-89c0774fbc74/tracking_id /read_00a2dfb2-c216-4704-8236-aacd124d60a8 Group

liuqianhn commented 2 years ago

@GianlucaDamaggio : DeepRepeat does not support multi-fast5 yet. Please ont5 to convert them to single-fast5 format.

GianlucaDamaggio commented 2 years ago

Thanks for the warning but after the conversion to single_fast5 I have this error :

Error! cannot read fast5 file= /home/xxx/xxx/xxx/pass/fast5_single/0/43d900f2-8301-45d0-80fe-024aaaf7e102.fast5
For: Qry= cb7ddec0-7da2-44e9-87c8-fc9bcfbe1930:717(717)-1518 with=1728/1728 Map= +chr4:3074876-3075648 Map-flag=0 qual= cigar-len=139/139
c_rep_regions_in_whole_0 chr4:3074876-3074940:3
c_rep_regions_in_whole_1 chr4:3074938-3074968:3
c_rep_regions_in_whole_2 chr4:3075017-3075049:12
c_rep_regions_0 chr4:3074876-3074933:3
Read fast5 file = /home/xxx/xxx/xxx/pass/fast5_single/0/cb7ddec0-7da2-44e9-87c8-fc9bcfbe1930.fast5
Error!!! Canot open file /home/xxx/xxx/xxx/pass/fast5_single/0/cb7ddec0-7da2-44e9-87c8-fc9bcfbe1930.fast5
HDF5-DIAG: Error detected in HDF5 (1.10.1) thread 139627075471168:
  #000: H5D.c line 294 in H5Dopen2(): unable to open dataset
    major: Dataset
    minor: Can't open object                                                                                                                                                                                                                                            [0/1919]
  #001: H5Dint.c line 1362 in H5D__open_name(): not found
    major: Dataset
    minor: Object not found
  #002: H5Gloc.c line 428 in H5G_loc_find(): can't find object
    major: Symbol table
    minor: Object not found
  #003: H5Gtraverse.c line 867 in H5G_traverse(): internal path traversal failed
    major: Symbol table
    minor: Object not found
  #004: H5Gtraverse.c line 753 in H5G_traverse_real(): component not found
    major: Symbol table
    minor: Object not found
Error! cannot read fast5 file= /home/xxx/xxx/xxx/pass/fast5_single/0/cb7ddec0-7da2-44e9-87c8-fc9bcfbe1930.fast5
Modused: trainedmod_32_0.2/CAG_htt2_90/AGC_3/200_v0.2motif32/np_dl_3
2022-07-19 10:50:13.247033: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX
2022-07-19 10:50:13.254401: I tensorflow/core/common_runtime/process_util.cc:69] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.

Now with h5ls -r I have this output:

/ Group /Raw Group /Raw/Reads Group /Raw/Reads/Read_42367 Group /Raw/Reads/Read_42367/Signal Dataset {15343/Inf} /UniqueGlobalKey Group /UniqueGlobalKey/channel_id Group /UniqueGlobalKey/context_tags Group /UniqueGlobalKey/tracking_id Group

liuqianhn commented 2 years ago

@GianlucaDamaggio DeepRepeat needs basecalled information generated by Albacore v2.3. Please use Albacore v2.3 for basecalling and then run DeepRepeat.

FAFUshiyan commented 1 year ago

The data using guppy basecalling cannot be used with DeepRepeat?

liuqianhn commented 1 year ago

@FAFUshiyan This version does not support guppy basecalling.