tama_remove_single_read_models_levels.py does not remove transcripts

sanyalab commented 1 year ago

Hi Richard,

I am using TAMA tools to consolidate different gene annotations of the same genome assembly. So far I have done the following

1) TAMA-Collapse: Collapse each annotation BAM file with TAMA Low processing parameters. "-d merge_dup -x no_cap -a 100 -m 10 -z 100 -sj sj_priority -lde 5 -sjt 20 -log log_off -b BAM"

2) TAMA-Merge: Merge all Collapsed output BAM files into a single bed file. The merge_priority I decided depends on which annotations I have a higher confidence upon. "-f filelist.txt -p TamaLow -e common_ends -a 200 -m 10 -z 200 -d merge_dup"

3) Generate Read Support file "python tama_read_support_levels.py -f filelist.txt2 -m TamaLow_merge.txt -o Read_Supp"

4) Remove models supported by less than 2 of the sources "python tama_remove_single_read_models_levels.py -b TamaLow.bed -r Read_Supp_read_support.txt -o Single_Model_Fil_2 -k remove_multi -l transcript -s 2"

I reviewed the Read support file, and found that for the first 3 genes, all transcripts have just one source of support.

Capture1

So they should be deleted from the final bed file. However, the final bed file still contains them

Capture2

Am I interpreting something wrong? Please advice.

Thanks Abhijit

sanyalab commented 1 year ago

Hi Richard,

A follow up question. If I use TAMA-Merge directly, by combining annotations, how do I generate the read support. I am unsure what to put in the filelist.txt file. I use </Path/2/annotation.bed>. This format gives me an error.

Thanks Abhijit

GenomeRIK commented 10 months ago

Hi Abhijit,

Regarding your first question the gene ID's and transcript ID's are changed to consolidate numbering. Please use the "singleton_report.txt" file to see the mapping of ID's.

As for the second question about TAMA Merge, you need to have read support files for each step of annotation generation. So you need to have input read supports to generate the next level of read support.

Thank you, Richard

GenomeRIK / tama

tama_remove_single_read_models_levels.py does not remove transcripts #106