MBoemo / DNAscent

Software for detecting regions of BrdU and EdU incorporation in Oxford Nanopore reads.
https://www.boemogroup.org/
GNU General Public License v3.0
26 stars 13 forks source link

Issue to run detect #13

Closed liuqianhn closed 3 years ago

liuqianhn commented 3 years ago

Hi, I tried to use your nice tool to analyze your barcoded data (2018_06_18_CAM_ONT_gDNA_BrdU_40_60_80_100). After I run index without sequencing_summary.txt, I got the error below to run detect:

Loading DNAscent index... ok.
Importing reference... ok.
Opening bam file... ok.
Scanning bam file...ok.
HDF5-DIAG: Error detected in HDF5 (1.8.14) thread 46912976045824: 0sec  failed:     0
  #000: H5F.c line 591 in H5Fopen(): invalid file name
    major: Invalid arguments to routine
    minor: Bad value
HDF5-DIAG: Error detected in HDF5 (1.8.14) thread 46912982349568:
  #000: H5F.c line 591 in H5Fopen(): invalid file name
    major: Invalid arguments to routine
    minor: Bad value
HDF5-DIAG: Error detected in HDF5 (1.8.14) thread 46912967640832:
  #000: H5F.c line 591 in H5Fopen(): invalid file name
    major: Invalid arguments to routine
    minor: Bad value
HDF5-DIAG: Error detected in HDF5 (1.8.14) thread 46912978147072:
  #000: H5F.c line 591 in H5Fopen(): invalid file name
    major: Invalid arguments to routine
    minor: Bad value
HDF5-DIAG: Error detected in HDF5 (1.8.14) thread 46912973944576:
  #000: H5F.c line 591 in H5Fopen(): invalid file name
    major: Invalid arguments to routine
    minor: Bad value
HDF5-DIAG: Error detected in HDF5 (1.8.14) terminate called recursively
terminate called recursively
thread 46912971843328:
terminate called recursively
terminate called after throwing an instance of 'IOerror'

I also run detect after using index with sequencing_summary.txt on other data, and do not get this error. May I know how to fix it? Thanks.

I also find the index file without sequencing_summary.txt and with sequencing_summary.txt are different. Without sequencing_summary.txt, the index is like below:

#bulk
yses  2018_06_18_CAM_ONT_gDNA_BrdU_40_60_80_100/workspace/pass/barcode08/0/ompi174_20180618_FAH65888_MN24299_mux_scan_2018_06_18_CAM_ONT_gDNA_BrdU_BC_67104_read_100_ch_468_strand.fast5
iousReadInfo   2018_06_18_CAM_ONT_gDNA_BrdU_40_60_80_100/workspace/pass/barcode08/0/ompi174_20180618_FAH65888_MN24299_mux_scan_2018_06_18_CAM_ONT_gDNA_BrdU_BC_67104_read_100_ch_468_strand.fast5
   2018_06_18_CAM_ONT_gDNA_BrdU_40_60_80_100/workspace/pass/barcode08/0/ompi174_20180618_FAH65888_MN24299_mux_scan_2018_06_18_CAM_ONT_gDNA_BrdU_BC_67104_read_100_ch_468_strand.fast5
ueGlobalKey 2018_06_18_CAM_ONT_gDNA_BrdU_40_60_80_100/workspace/pass/barcode08/0/ompi174_20180618_FAH65888_MN24299_mux_scan_2018_06_18_CAM_ONT_gDNA_BrdU_BC_67104_read_100_ch_468_strand.fast5
yses  2018_06_18_CAM_ONT_gDNA_BrdU_40_60_80_100/workspace/pass/barcode08/0/ompi174_20180618_FAH65888_MN24299_mux_scan_2018_06_18_CAM_ONT_gDNA_BrdU_BC_67104_read_101_ch_27_strand.fast5
iousReadInfo   2018_06_18_CAM_ONT_gDNA_BrdU_40_60_80_100/workspace/pass/barcode08/0/ompi174_20180618_FAH65888_MN24299_mux_scan_2018_06_18_CAM_ONT_gDNA_BrdU_BC_67104_read_101_ch_27_strand.fast5
   2018_06_18_CAM_ONT_gDNA_BrdU_40_60_80_100/workspace/pass/barcode08/0/ompi174_20180618_FAH65888_MN24299_mux_scan_2018_06_18_CAM_ONT_gDNA_BrdU_BC_67104_read_101_ch_27_strand.fast5
ueGlobalKey 2018_06_18_CAM_ONT_gDNA_BrdU_40_60_80_100/workspace/pass/barcode08/0/ompi174_20180618_FAH65888_MN24299_mux_scan_2018_06_18_CAM_ONT_gDNA_BrdU_BC_67104_read_101_ch_27_strand.fast5
yses  2018_06_18_CAM_ONT_gDNA_BrdU_40_60_80_100/workspace/pass/barcode08/0/ompi174_20180618_FAH65888_MN24299_mux_scan_2018_06_18_CAM_ONT_gDNA_BrdU_BC_67104_read_105_ch_503_strand.fast5
iousReadInfo   2018_06_18_CAM_ONT_gDNA_BrdU_40_60_80_100/workspace/pass/barcode08/0/ompi174_20180618_FAH65888_MN24299_mux_scan_2018_06_18_CAM_ONT_gDNA_BrdU_BC_67104_read_105_ch_503_strand.fast5

Could you please let me know whether it is correct?

MBoemo commented 3 years ago

Can you let me know which version of Guppy you used to do the basecalling?

liuqianhn commented 3 years ago

Thanks for your reply. I used Albacore v2.3.4 for now. I do not try Guppy yet. Does this matter? I can successfully run DNAScent on the data released by Fork-seq team with same pipeline.

MBoemo commented 3 years ago

I see - would you mind sharing the command you used to run DNAscent index? The crux of the issue here is that DNAscent can't find the fast5 files, and my guess is that it's down to the file path that was passed to the index subprogram.

Just note that Albacore is quite a few years out of date now, so you might get better results with a more recent version of Guppy, depending on what you're looking at.

liuqianhn commented 3 years ago

@MBoemo Thanks for your suggestions. Guppy will generate better results.

The commands I used is ./DNAscent/bin/DNAscent index -f basecalled/2018_06_18_CAM_ONT_gDNA_BrdU_40_60_80_100/workspace/pass/barcode12/ -o 2018_06_18_CAM_ONT_gDNA_BrdU_40_60_80_100_barcode12.dnascent I also tried to split 2018_06_18_CAM_ONT_gDNA_BrdU_40_60_80_100 together with sequence_summary.txt for each barcode, but I still cannot run danscent-index successfully. Thanks for your suggestion.

MBoemo commented 3 years ago

The path you're passing via the -f flag looks like a relative path rather than the full path. Note the line from the documentation that specifies DNAscent needs a full path (see https://dnascent.readthedocs.io/en/latest/index_subprog.html). Use the full path for both the fast5 files and sequencing summary file and it should work. Closing for now but let me know if there are any further issues.

liuqianhn commented 3 years ago

@MBoemo Thanks for your suggestions. I run DNAscent index both with a full path and a relative path, but I always got error to say that a certain fast5 file cannot be open (however, I can access and open the fast files in my terminal.). In fact, when I run DNAscent index on other datasets, I do not have this error even with a relative path. Not sure why.

MBoemo commented 3 years ago

Sorry to hear that this is still an issue, I'm sure we can get it sorted out. So just to confirm, this seems to be one problematic fast5 file from a particular run, but all the other fast5 files in that run (and all the fast5 files from other datasets) seem okay? I've not seen this particular issue arise before, and it's the type of thing that would be hard to reproduce on my end so I can check what's going on - would you be willing to send me the problematic fast5 file so I can take a look? That's probably the fastest path to a resolution.

liuqianhn commented 3 years ago

The error of running DNAscent index only happened on barcode08 and barcode09 of your data 2018_06_18_CAM_ONT_gDNA_BrdU_40_60_80_100. Since the running was terminated when there is an error, I do not have all fast5 files with error:

2018_06_18_CAM_ONT_gDNA_BrdU_40_60_80_100/workspace/pass/barcode09/32/ompi174_20180620_FAH65888_MN24299_sequencing_run_2018_06_18_CAM_ONT_gDNA_BrdU_BC_77372_read_14706_ch_55_strand.fast5
2018_06_18_CAM_ONT_gDNA_BrdU_40_60_80_100/workspace/pass/barcode08/35/ompi174_20180619_FAH65888_MN24299_sequencing_run_2018_06_18_CAM_ONT_gDNA_BrdU_BC_71118_read_21633_ch_22_strand.fast5

Meanwhile, when I run DNAscent detect, I have no error on 2018_09_26_CAM_ONT_2085_1x_cell_cycle data but have an error on 2018_09_18_CAM_ONT_2085_HU_1x. The fast5 file which cannot be open is

2018_09_18_CAM_ONT_2085_HU_1x/workspace/pass/49/ompi174_20180918_FAK12651_MN17319_sequencing_run_2018_10_17_CAM_ONT_2085_HU_1x_17983_read_6783_ch_353_strand.fast5
MBoemo commented 3 years ago

Okay thanks - my suspicion is that this is an issue with the data (either with the download from, or upload to, GEO) rather than the software itself. I'll see if I can reproduce this and get back to you, though I might not get to it until tomorrow.

liuqianhn commented 3 years ago

@MBoemo Thanks for your quick reply. It would be the downloading issues. But please note that I can use h5ls/h5dump to access and open the fast5 files which were causing DNAScent index/detect to be terminated in my running.

MBoemo commented 3 years ago

I've taken a look at this and wasn't able to reproduce the issue. The workflow I followed (so you can compare with yours) was:

The commands I used (with paths truncated for clarity) are as follows: ont-guppy/bin/guppy_basecaller -i /path/to/2018_09_18_CAM_ONT_2085_HU_1x -s /path/to/fast5_basecalled -r -c ont-guppy/data/dna_r9.4.1_450bps_fast.cfg cat fast5_basecalled/*.fastq > reads.fastq minimap2-2.17_x64-linux/minimap2 -ax map-ont -a -o alignments.sam SacCer3.fasta reads.fastq samtools view -Sb -o alignments.bam alignments.sam samtools sort alignments.bam alignments.sorted samtools index alignments.sorted.bam DNAscent index -f /path/to/2018_09_18_CAM_ONT_2085_HU_1x -s /path/to/fast5_basecalled/sequencing_summary.txt DNAscent detect -b alignments.sorted.bam -i index.dnascent -r SacCer3.fasta -o test.detect I did this both for the specific problematic fast5 file you identified, as well as for the whole run just to be safe, and both were okay. So I think at this point, it's unlikely to be caused by a DNAscent issue or anything on GEO's end. I've you've followed the steps above and there's still a problem, my best guess is there was an issue with your download.