Add junction reads track

mpoelchau commented 4 years ago

Let's try to use regtools for this.

regtools junctions-extract function: https://regtools.readthedocs.io/en/latest/commands/junctions-extract/
parameters to specify:
- -m 20 (should match the min intron parameter from hisat)
- -s 0 (let's assume unstranded for now)
- o [gggsss]_[assemblyname]_downsampled-RNA-Seq-alignments_[date].bed (output file should have the same name prefix as everything else, but with .bed extension
- input file should be the final, merged, indexed bam file
the output file needs to be moved over to our servers, and added to trackList.json.
- perl flatfile-to-json.pl --bed OUTPUT-BED-FILE --trackLabel '[gggsss]_[assemblyname]_RNA-Seq-alignments_[date]_junctions' --config '{"style":{"showLabels": false}, "metadata": {BED METADATA BELOW}, "category":"RNA-Seq/Intronic splice junctions" }' --className feature3

BED METADATA "Analysis provider": "i5k Workspace@NAL", "Analysis method": "https://github.com/NAL-i5K/NAL_RNA_seq_annotation_pipeline/", "Data source":"[comma-delimited SRA ACCESSIONS from 'Submission' column in .tsv file]", "Publication status":"Analysis: NA; Source data: see individual SRA accessions", "Track legend":"Intronic junction reads generated by Hisat2 aligner and regtools"

HsiuKangHuang commented 4 years ago

Is the input file for this command sorted bam file? - or should I use indexed bam file as input? (bam.bai file)

mpoelchau commented 4 years ago

The sorted and indexed bam file (.bam file). The regtools documentation doesn't specify how the corresponding .bai file is named - hopefully name.bam.bai, but sometimes some tools expect name.bai.

https://regtools.readthedocs.io/en/latest/commands/junctions-extract/

HsiuKangHuang commented 4 years ago

Right. I found that regtools can't use bam.bai file. Should I rename bam.bai file to something like "indexed-file.bam" before I use regtools and rename it back to bam.bai after regtools finish the process?

mpoelchau commented 4 years ago

Can you try .bai? E.g. if the .bam file is called .bam, call the indexed file .bai.

The input file for the regtools command would then be .bam.

HsiuKangHuang commented 4 years ago

Do you mean that rename file .bam.bai to file .bai and use it as input? I tried filename.bai but regtools still couldn't open it. I also tried file .bai.bam and it still couldn't work.

mpoelchau commented 4 years ago

I'm having trouble processing .bed files with the flatfile-to-json.pl script - I will update this issue when I figure it out.

mpoelchau commented 4 years ago

flatfile-to-json.pl (on our servers at least) doesn't work as expected after Jbrowse 1.16.6. See https://github.com/GMOD/jbrowse/issues/1511 for a full description of the issue, and https://github.com/GMOD/jbrowse/issues/1511#issuecomment-636185863 for a suggested solution.

We can't implement the changes Colin recommends, since installing tabix and bgzip requires htslib, and it doesn't install on CentOS6 (or at least I haven't figured out how to; see also https://github.com/NAL-i5K/remap-gff3/issues/35).

For now we can still use our existing workflow with Jbrowse 1.16.5 on our staging and prod sites. Once we move to CentOS8 + Apollo 2.6+, we can revisit this issue.

mpoelchau commented 3 years ago

We've migrated to Centos8 - reopening this issue because we need to change addtrackList.py to use the setup Colin recommends.

mpoelchau commented 3 years ago

@g8tor I gave this a go but I'll need some of your python expertise for this...

NAL-i5K / NAL_RNA_seq_annotation_pipeline

Add junction reads track #34