GenomeRIK / tama

Transcriptome Annotation by Modular Algorithms (for long read RNA sequencing data)
GNU General Public License v3.0
128 stars 25 forks source link

How to format .bed generated by PASA to TAMA merge-acceptable .bed #115

Open Jung19911124 opened 11 months ago

Jung19911124 commented 11 months ago

Hi,

I am planning to integrate short-read data with Iso-seq data by TAMA merge. I assembled short read data with Trinity and then aligned assembled fasta to the genome by PASA, but TAMA merge resulted in an error saying the data were not acceptable.

How do I format the bed files generated by PASA? Any suggestion would be appreciated.

Best, Jung

GenomeRIK commented 11 months ago

Hi Jung,

What is the format you have as an output from PASA?

Thank you, Richard

Jung19911124 commented 7 months ago

I'm sorry for not replying sooner to you. First of all, the format of the PASA output.bed was as follows.

Chr1 10378786 10379194 ID=asmbl_27757 0 + 10378786 10379194 0 1 408 0 Chr2 4804691 4804903 ID=asmbl_52198 0 - 4804691 4804903 0 1 212 0 Chr3 6729418 6730507 ID=asmbl_29364 0 - 6729418 6730507 0 3 73,118,68 0,327,1021

After some trial and error, I edited the bed as follows, and the error did not occur. Chr1 10378786 10379194 ID=asmbl_27757;asmbl_27757.t1 0 + 10378786 10379194 0 1 408 0 Chr2 4804691 4804903 ID=asmbl_52198;asmbl_52198.t1 0 - 4804691 4804903 0 1 212 0 Chr3 6729418 6730507 ID=asmbl_29364;asmbl_29364.t1 0 - 6729418 6730507 0 3 73,118,68 0,327,1021

The problem seems to be that the PASA output .bed files do not have a unique identifier for each transcript. Again, I am very sorry for the delay in replying.

Best, Jung