Open dudududu12138 opened 1 month ago
Dear @dudududu12138
Best Andrey
Dear @dudududu12138
- Could you send me the files? This looks unexpected.
- Currently, if you provide several BAM files, they will treated as distinct samples within the same experiment. Thus, a single output folder with a single transcript_models.gtf files will be created. "Per-file" count/TPM tables are stored in _grouped files. Combined tables are now created only if you provide 2 or more experiments via YAML dataset file.
Best Andrey
Sorry for the late reply. Below are the OUT.transcript_model_counts.tsv
and OUT.transcript_models.gtf
files. There are more transcripts in the gtf file. That is ,the transcripts in *counts.tsv
are subset of transcripts in *models.gtf
. It is odd.
OUT.transcript_models.gtf.gz
OUT.transcript_model_counts.tsv.gz
Moreover, I noticed that in my result, OUT.transcript_model_reads.tsv.gz
and OUT.transcript_models.gtf
have the same number of transcripts. While OUT.transcript_model_tpm.tsv
and OUT.transcript_model_counts.tsv
have the same number of transcripts and they are the subset of OUT.transcript_models.gtf
By the way, I checked myself. I extracted an example of a transcript that is reported in the OUT.transcript_models.gtf
file but not in the OUT.transcript_model_counts.tsv
file.
Below is the transcript id and read id that support the transcript that extracted from OUT.transcript_model_reads.tsv.gz
.
Then I extracted the these reads from
OUT.read_assignments.tsv.gz
and saved them into file(example.read.assignment.txt.)
It is odd that some reads that support the novel transcript are full splice match and uniquely assigned to the reference transcript. So I am very confused.
Is the OUT.transcript_model_counts.tsv
just counts the reads that uniquely mapped to the transcript?
@dudududu12138
Yes, this is really odd. Could you also send me the full log file?
@dudududu12138
Here is what I found so far. Output GTF and supporting reads are likely correct. However, expression estimation algorithm is independent from transcript discovery and for some reason assigns 0 to some of the transcripts.
Transcripts with 0 counts/TPMs are not printed to transcript_model_counts.tsv
. Hence the difference.
I will now check why expression estimation algorithm assigns 0 to these transcripts despite the fact they are supported by reads.
Best Andrey
Hi, I recently used isoquant(version 3.4.1) to detect isoforms from multiple samples. But the results are a little odd. Below are my quaetions:
OUT.transcript_models.gtf
while 166857 inOUT.transcript_model_tpm.tsv
.isoquant.py -t 30 \ --reference $ref \ --genedb $anno --complete_genedb \ --bam $samples \ --data_type pacbio_ccs \ -o $out/