andreaminio / AnnotationPipeline-EVM_based-DClab

Cantù Lab @ UC Davis - Annotation pipeline - EVM based
14 stars 10 forks source link

Questions about transcript model selection #7

Closed yaoxkkkkk closed 8 months ago

yaoxkkkkk commented 8 months ago

Hi, I am now processing this section 2.a.2.1.3, but I found my RNAseq_assembly.check.perfect_intron_chain.perfect_cds_match.txt contained "FALSE" in every row, as a result, I can't generate any further files, here is the log I check the quantity of "FALSE" and "TRUE".

$ cat HD_hap1.RNAseq_assembly.check.perfect_intron_chain.perfect_cds_match.txt | wc -l
11189

$ cat HD_hap1.RNAseq_assembly.check.perfect_intron_chain.perfect_cds_match.txt | grep "FALSE" | wc -l
11189

$ cat HD_hap1.RNAseq_assembly.check.perfect_intron_chain.perfect_cds_match.txt | grep "TRUE" | wc -l
1128

I don't know how to deal with it, any suggestions will be appreciated!

yaoxkkkkk commented 8 months ago

I remove the stringtie genome-guided transcripts, and now I get 1097 sequences from two trinity RNA assembly, which are considered as perfectly matching intron/exon and CDSs sequences. I want to know if it's acceptable to do so, and may I ask how many genes are needed to create a training set?