dieterich-lab / JACUSA2

New version of JACUSA -> 2.0
GNU General Public License v3.0
21 stars 3 forks source link

JACUSA_to_TRIBE.pl obsolete? #45

Open ckuenne opened 3 years ago

ckuenne commented 3 years ago

Is the JACUSA_to_TRIBE.pl script found here (https://github.com/dieterich-lab/tribe-workflow/tree/master/RRD_workflow) still working for the JACUSA2 output format? Or is that functionality supposed to be superseded by JACUSA2helper? If so, can you suggest a workflow? Since TRIBE was one of the original selling points for JACUSA, an automation or tutorial for this specific pipeline might be of wider interest.

OR should we rather use JACUSA2 directly with -B A2G? I do have stranded paired-end reads, so that should theoretically work, right? But there I run into an error (it works without -B):

JACUSA2 Version: 2.0.1 call-2 -a D,M,Y -filterNM 5 -s -c 2 -T 1 -P RF-FIRSTSTRAND -B A2G -p 16 -r x/jacusa_ht_vs_m3d-ht.A2G.new.org ./star_ht_1/igv/ht_1.bam,./star_ht_2/igv/ht_2.bam,./star_ht_3/igv/ht_3.bam,./star_ht_4/igv/ht_4.bam ./star_m3d-ht_1/igv/m3d-ht_1.bam,./star_m3d-ht_2/igv/m3d-ht_2.bam,./star_m3d-ht_3/igv/m3d-ht_3.bam,./star_m3d-ht_4/igv/m3d-ht_4.bam
--------------------------------------------------------------------------------
INFO    00:00:02  Thread 15: Working on contig chr1:3008197-3108196
INFO    00:00:02  Thread 14: Working on contig chr1:3108523-3208522
INFO    00:00:02  Thread 13: Working on contig chr1:3208523-3308522
INFO    00:00:02  Thread 12: Working on contig chr1:3308523-3408522
INFO    00:00:02  Thread 11: Working on contig chr1:3408523-3508522
INFO    00:00:02  Thread 10: Working on contig chr1:3508523-3608522
ERROR   00:00:02  Problem with read: NS500475:556:HHNVMBGXC:2:23111:24722:17846 in ./star_ht_2/igv/ht_2.bam
java.lang.NullPointerException: Cannot invoke "java.lang.Integer.intValue()" because the return value of "htsjdk.samtools.SAMRecord.getIntegerAttribute(String)" is null
        at lib.data.storage.readsubstitution.BaseSubRecordProcessor.collectBaseSubs(BaseSubRecordProcessor.java:258)
        at lib.data.storage.readsubstitution.BaseSubRecordProcessor.processPE(BaseSubRecordProcessor.java:218)
        at lib.data.storage.readsubstitution.BaseSubRecordProcessor.processInternalPE(BaseSubRecordProcessor.java:209)
        at lib.data.storage.readsubstitution.BaseSubRecordProcessor.processPE(BaseSubRecordProcessor.java:184)
        at lib.data.storage.readsubstitution.BaseSubRecordProcessor.process(BaseSubRecordProcessor.java:238)
        at lib.data.storage.container.UnstrandedCacheContainter.process(UnstrandedCacheContainter.java:50)
        at lib.data.storage.container.AbstractStrandedCacheContainer.process(AbstractStrandedCacheContainer.java:83)
        at lib.data.storage.container.RFPairedEnd1CacheContainer.process(RFPairedEnd1CacheContainer.java:1)
        at lib.data.assembler.SiteDataAssembler.buildCache(SiteDataAssembler.java:56)
        at lib.util.ReplicateContainer.createIterators(ReplicateContainer.java:49)
        at lib.util.ConditionContainer.lambda$0(ConditionContainer.java:29)
        at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1625)
        at java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:658)
        at lib.util.ConditionContainer.updateWindowCoordinates(ConditionContainer.java:29)
        at lib.worker.AbstractWorker.updateReservedWindowCoordinate(AbstractWorker.java:170)
        at lib.worker.AbstractWorker.processInit(AbstractWorker.java:186)
        at lib.worker.AbstractWorker.run(AbstractWorker.java:218)

It was mapped with STAR 2.7.9a (including the MD field) and the offending read pair is this one:

NS500475:556:HHNVMBGXC:2:23111:24722:17846  163 chr1    3016524 255 75M =   3016637 187 CTGGGTCTTCAATTCTGTTACATTGGTCTACTTGTCTGTCACTATACCAGTACCATGCAGTTTTGATAACAATTG AA//AEE/EEEE//EEEEE//EEE/EE6/EEE/A//</EE6/EA///EEE/EE/EEEEE/<AEE<EE//<<E/E< NH:i:1  HI:i:1  AS:i:145    nM:i:1  MD:Z:67C7
NS500475:556:HHNVMBGXC:2:23111:24722:17846  83  chr1    3016637 255 74M =   3016524 -187    CCAGAGGTTCTTTTATCCTTGAGAAGAGTTTTTGCTATCCTCGTTTTTTTGTTATTCCAGATGAATCTGCCGAT  </A<<A<//66EEA///A<E/<A/EEEEEEEEEEEEEEEEE///EEEEEE/EEEAE/E/EEEAAAEAEE6AAAA  NH:i:1  HI:i:1  AS:i:145    nM:i:1  MD:Z:74

I tried different Java versions: oracle_jre_1.8.0_211, open_jdk_15.0.2, open_jdk_16.0.1.

piechottam commented 3 years ago

Dear ckuenne,

Thank you for your feedback and detailed problem description!

  1. NullPointerException The SAM Output that you provided, suggests that STAR does not adhere to recommended practices for SAM format https://samtools.github.io/hts-specs/SAMtags.pdf The required "NM" tag is not present in the SAM file. According to STAR manual (https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf): [...] The SAM attributes can be specified by the user using --outSAMattributes A1 A2 A3 ... option which accept a list of 2-character SAM attributes. The implemented attributes are: NH HI NM MD AS nM jM jI XS. By default, STAR outputs NH HI AS nM attributes. [...] I am afraid, you will need to rerun STAR with "--outSAMattributes NM NH [...and what ever you need...]"

    STAR output has the nonstandard field "nM". From SAMtags.pdf: [...] Note that tags starting with ‘X’, ‘Y’, or ‘Z’ and tags containing lowercase letters in either position are reserved for local use and will not be formally defined in any future version of this [...]

  2. JACUSA_to_TRIBE.pl The person responsible for this PERL script is on vacation... as soon as he is back, I'll ask him to provide feedback. There are some workflows on https://dieterich-lab.github.io/JACUSA2helper/ (Articles). In general, we aim to add "useful" methods to JACUSA2helper.

ckuenne commented 3 years ago

Dear piechottam,

Thanks for the quick response! I have rerun STAR with "--outSAMattributes NH HI AS nM NM MD" and that seems to have solved the issue. JACUSA2 is currently running with -B A2G. I might be back with more feedback once that is done.

And maybe it would make sense to implement a small check for the BAM format in JACUSA. STAR is a pretty standard tool to use after all. And the default exception thrown is not really helpful to diagnose this.

ckuenne commented 3 years ago

another question to JACUSA_to_TRIBE.pl: does it only work for stranded_forward or also for stranded_reverse libraries? if i understood correctly, jacusa already takes care of that using -P FR-SECONDSTRAND / RF-FIRSTSTRAND = the jacusa output is relative to the original RNA/DNA, not to the sequencing. so the perl script does not have deal with strandedness again?