WGLab / DeepRepeat

An accurate repeat detection from Nanopore data using deep learning and image techniques
Other
19 stars 4 forks source link

Issue on summary file and generating f5index #13

Open HLHsieh opened 1 year ago

HLHsieh commented 1 year ago

Hi there,

I am trying to use DeepRepeat on a set of simulated data through docker, and I have executed the following code.

singularity run docker://genomicslab/deeprepeat:0.1.3 DeepRepeat.py Detect --bam myseq.bam --o output --repeat $repeat --repeatName $repeatName --UniqueID myseq --f5folder ./fast5

However, I got an error indicating the following:

Error happen!
         /scratch/stimulated_c3/fast5/sequencing_summary.txt does not exist. Should specified by --summary_file.

Actually, this is a set of simulated data, and I did not have a sequencing_summary.txt. Therefore, I am wondering whether it is necessary to have a sequencing_summary.txt or whether there is other options.

Please advice. Thank you!

Best, Hsin

liuqianhn commented 1 year ago

@HLHsieh When you run docker, you need to use -v to specify the mapping of your input folder.

HLHsieh commented 1 year ago

Hi @liuqianhn,

Thanks for your prompt response. I fixed summary file issue via adding --summary_file fastq/sequencing_summary.txt, but I got other error as follows.

INFO:    Using cached SIF image
The following options are used (included default):
            UniqueID    (myseq);
                 bam    (myseq.bam);
     basecalled_path    (fastq/workspace/pass/);
            f5config    (/app/data/config/fast5_path.config);
            f5folder    (fast5);
                 f5i    (fast5/f5.f5index);
        f5i_basefile    (f5);
         feature_num    (50);
          label_size    (4);
           merge_gap    (9.0);
            mod_path    (None);
         mod_version    (2);
             multif5    (0);
             nb_size    (3);
              nbsize    (-1.5);
              outlog    (0);
        outputfolder    (output);
                 pcr    (False);
              repeat    (chr9:27573494-27573709:CCCCGG:3);
         repeat_name    (C9ORF72);
          repeat_pat    (CCCCGG);
                 rpg    (/app/data/trf.v0.bed);
        summary_file    (fastq/sequencing_summary.txt);
Generating f5index: /app/scripts/IndexF5files fast5 fastq/workspace/pass/ f5 fastq/sequencing_summary.txt 0
Error!! Cannot generate index file for fast5/f5.f5index.

As you mentioned in the other poster, DeepRepeat will generate f5.f5index automatically. Please advise how to fix my issue.

I executed

singularity run docker://genomicslab/deeprepeat:0.1.3 DeepRepeat.py Detect --bam myseq.bam --o output --repeat $repeat --repeatName $repeatName --UniqueID myseq --basecalled_path fastq/workspace/pass/ --f5folder ./fast5 --summary_file fastq/sequencing_summary.txt

Many thanks, Hsin

liuqianhn commented 1 year ago

@HLHsieh Sorry for the late reply. The issue is because folders in docker and your local folders are different. You are using your local folder while the docker command considers them as the folders in docker. Thus, you need to use -v to map your local folders to the folders in docker. Example can be like this

singularity run docker://genomicslab/deeprepeat:0.1.3 -v $(pwd):/my_dataset  DeepRepeat.py Detect --bam /my_dataset/myseq.bam --o output --repeat $repeat --repeatName $repeatName --UniqueID myseq --basecalled_path /my_dataset/fastq/workspace/pass/ --f5folder /my_dataset/fast5 --summary_file /my_dataset/fastq/sequencing_summary.txt

where your current local folder contains myseq.bam, fastq/workspace/pass/, /my_dataset/fast5 and /my_dataset/fastq/