GenomeRIK / tama

Transcriptome Annotation by Modular Algorithms (for long read RNA sequencing data)
GNU General Public License v3.0
125 stars 24 forks source link

tama_remove_single_read_models_levels.py does not remove transcripts #106

Closed sanyalab closed 10 months ago

sanyalab commented 1 year ago

Hi Richard,

I am using TAMA tools to consolidate different gene annotations of the same genome assembly. So far I have done the following

1) TAMA-Collapse: Collapse each annotation BAM file with TAMA Low processing parameters. "-d merge_dup -x no_cap -a 100 -m 10 -z 100 -sj sj_priority -lde 5 -sjt 20 -log log_off -b BAM"

2) TAMA-Merge: Merge all Collapsed output BAM files into a single bed file. The merge_priority I decided depends on which annotations I have a higher confidence upon. "-f filelist.txt -p TamaLow -e common_ends -a 200 -m 10 -z 200 -d merge_dup"

3) Generate Read Support file "python tama_read_support_levels.py -f filelist.txt2 -m TamaLow_merge.txt -o Read_Supp"

4) Remove models supported by less than 2 of the sources "python tama_remove_single_read_models_levels.py -b TamaLow.bed -r Read_Supp_read_support.txt -o Single_Model_Fil_2 -k remove_multi -l transcript -s 2"

I reviewed the Read support file, and found that for the first 3 genes, all transcripts have just one source of support.

Capture1

So they should be deleted from the final bed file. However, the final bed file still contains them

Capture2

Am I interpreting something wrong? Please advice.

Thanks Abhijit

sanyalab commented 1 year ago

Hi Richard,

A follow up question. If I use TAMA-Merge directly, by combining annotations, how do I generate the read support. I am unsure what to put in the filelist.txt file. I use </Path/2/annotation.bed>. This format gives me an error.

Thanks Abhijit

GenomeRIK commented 10 months ago

Hi Abhijit,

Regarding your first question the gene ID's and transcript ID's are changed to consolidate numbering. Please use the "singleton_report.txt" file to see the mapping of ID's.

As for the second question about TAMA Merge, you need to have read support files for each step of annotation generation. So you need to have input read supports to generate the next level of read support.

Thank you, Richard