dieterich-lab / DCC

DCC uses output from the STAR read mapper to systematically detect back-splice junctions in next-generation sequencing data. DCC applies a series of filters and integrates data across replicate sets to arrive at a precise list of circRNA candidates.
https://dieterichlab.org/software/
GNU General Public License v3.0
36 stars 20 forks source link

DCC crashes with -fg set and -M unset #42

Closed tjakobi closed 6 years ago

tjakobi commented 7 years ago

DCC crashes when using the following command line:

DCC @samplesheet -mt1 @mate1 -mt2 @mate2 -T 8 -D -N -an
/data/yaoyh/circRNA/GTF_file/gencode.v26.annotation.gtf -Pi -F -fg -G -A
/data/yaoyh/circRNA/Homo_sapiens.GRCh38.dna.primary_assembly.fa -Nr 1 2

The following error is thrown: 1

The issues seems to be related to the -M and -fg filter flags.

Yaoyinghhao commented 7 years ago

Hi tjakobi, Attached files are the samplesheet, mate1, and mate2 files. To upload purpose, I add .txt as suffix for these three files.

mate1.txt mate2.txt samplesheet.txt

tjakobi commented 7 years ago

Dear @Yaoyinghhao,

I've looked into the DCC code for some more insights. If you are using the latest DCC version on GitHub than line 356 in the main script is as follows:

        if not options.chrM and not options.filterbygene:
            filt.sortOutput(options.tmp_dir + "tmp_unsortedWithChrM", output_circ_counts,
                            output_coordinates, samplelist)

That means this code should only be executed when neither -M or -fg is specified. Therefore this line should never be run if you specify -fg or -M in the command line.

However, you wrote that the command line you used contains the -fg parameter. Could you please verify that you indeed used -fg and not -M in the command line? If possible you may upload the DCC log file that contains the complete command line.

Cheers, Tobias

Yaoyinghhao commented 7 years ago

Dear Tobias,

According to your reply, I noticed the GTF file that I used is not correctly formated. So, I re-downloaded the lasted version of GTF file and run DCC again. But I still got wrong message.

Attached file are the first 100 lines of GTF file that I used in the second run and corresponding log file.

Thank you. Yinghao

At 2017-08-28 01:11:59, "Tobias Jakobi" notifications@github.com wrote:

Dear @Yaoyinghhao,

I've looked into the DCC code for some more insights. If you are using the latest DCC version on GitHub than line 356 in the main script is as follows:

    if not options.chrM and not options.filterbygene:
        filt.sortOutput(options.tmp_dir + "tmp_unsortedWithChrM", output_circ_counts,
                        output_coordinates, samplelist)

That means this code should only be executed when neither -M or -fg is specified. Therefore this line should never be run if you specify -fg or -M in the command line.

However, you wrote that the command line you used contains the -fg parameter. Could you please verify that you indeed used -fg and not -M in the command line? If possible you may upload the DCC log file that contains the complete command line.

Cheers, Tobias

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread. 2017-08-28 08:39:43,163 DCC 0.4.4 started 2017-08-28 08:39:43,163 DCC command line: /home/yaoyh//.local/bin/DCC @samplesheet -mt1 @mate1 -mt2 @mate2 -T 16 -D -an /data/yaoyh/circRNA/GTF_file/gencode.v26.annotation.gtf -Pi -M -F -fg -G -Nr 1 1 -A /data/yaoyh/circRNA/Homo_sapiens.GRCh38.dna.primary_assembly.fa 2017-08-28 08:39:43,164 Input file names have duplicates, add number suffix in input order to output files for distinction 2017-08-28 08:39:43,242 Starting to detect circRNAs 2017-08-28 08:39:43,242 Stranded data mode 2017-08-28 08:39:43,242 Please make sure that the read pairs have been mapped both, combined and on a per mate basis 2017-08-28 08:39:43,243 Collecting chimera information from mates-separate mapping 2017-08-28 08:40:03,298 started circRNA detection from file _tmp_DCC/Chimeric.out.junction.CP85IG 2017-08-28 08:40:03,299 started circRNA detection from file _tmp_DCC/Chimeric.out.junction.ML4PVW 2017-08-28 08:40:03,300 started circRNA detection from file _tmp_DCC/Chimeric.out.junction.HJDR5I 2017-08-28 08:40:03,301 started circRNA detection from file _tmp_DCC/Chimeric.out.junction.JW8Y8E 2017-08-28 08:40:03,302 started circRNA detection from file _tmp_DCC/Chimeric.out.junction.3AKSF5 2017-08-28 08:40:03,302 started circRNA detection from file _tmp_DCC/Chimeric.out.junction.6ZF9ND 2017-08-28 08:44:29,310 finished circRNA detection from file _tmp_DCC/Chimeric.out.junction.HJDR5I 2017-08-28 08:45:16,387 Read 96907264.-.96907748.SRR5398218.24236944 has more than 2 count. 2017-08-28 08:45:16,396 Read 96907264.-.96907748.SRR5398218.24236944 has more than 2 count. 2017-08-28 08:46:12,574 Read 96907264.-.96907748.SRR5398218.24236944 has more than 2 count. 2017-08-28 08:49:00,789 finished circRNA detection from file _tmp_DCC/Chimeric.out.junction.CP85IG 2017-08-28 08:53:50,006 finished circRNA detection from file _tmp_DCC/Chimeric.out.junction.6ZF9ND 2017-08-28 08:54:51,829 finished circRNA detection from file _tmp_DCC/Chimeric.out.junction.3AKSF5 2017-08-28 09:04:59,162 finished circRNA detection from file _tmp_DCC/Chimeric.out.junction.ML4PVW 2017-08-28 10:48:46,647 finished circRNA detection from file _tmp_DCC/Chimeric.out.junction.JW8Y8E 2017-08-28 10:48:46,650 Combining individual circRNA read counts 2017-08-28 10:50:07,721 Write in annotation 2017-08-28 10:50:07,722 Select gene features in Annotation file 2017-08-28 11:00:42,871 Filtering started 2017-08-28 11:00:42,871 Using files _tmp_DCC/tmp_circCount and _tmp_DCC/tmp_coordinates for filtering 2017-08-28 11:00:48,336 Filtering by read counts 2017-08-28 11:00:50,976 Deleting circRNA candidates from mitochondrial chromosome

tjakobi commented 7 years ago

Dear @Yaoyinghhao,

I can only see the attached log file and no GTF file. What error message are you receiving? Is it still the one from circAnnotate.py / HTSeq?

Cheers, Tobias

Yaoyinghhao commented 7 years ago

Dear Tobias,

The attached file is GTF file I used. The error message I got is printed in the following picture.

Cheers, Yinghao

At 2017-08-28 14:46:15, "Tobias Jakobi" notifications@github.com wrote:

Dear @Yaoyinghhao,

I can only see the attached log file and no GTF file. What error message are you receiving? Is it still the one from circAnnotate.py / HTSeq?

Cheers, Tobias

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

tjakobi commented 7 years ago

Dear @Yaoyinghhao,

Are you sure you attached screenshot and GTF file? Neither within GitHub nor the GitHub notification I can see any attachments.

You may want to try an upload service like https://www.file.io/ instead.

Cheers, Tobias

Yaoyinghhao commented 7 years ago

DCC-2017-08-28_08-39.log.txt error-message gencode.v26.annotation.gtf100line.txt

Yaoyinghhao commented 7 years ago

Can you see them now?

tjakobi commented 7 years ago

Yes, thank you for providing the files. However, I am not yet able to reproduce the error. I will post any updates here.

tjakobi commented 6 years ago

Dear @Yaoyinghhao,

sorry for the delay. I'm still having a hard time reproducing your error. Just to make sure we're on the same page:

Cheers, Tobias

tjakobi commented 6 years ago

Actually this error should have been fixed in #33 . Closing for now.