CBIIT / TULIP

Classifying RNA-seq samples into different tumor types.
GNU General Public License v3.0
2 stars 3 forks source link

Input format confusion #6

Open cloudyaaron opened 4 weeks ago

cloudyaaron commented 4 weeks ago

I found several ways to calculate the FPKM-uq, and non of them performed well in the predictions. Can you clarify the formal way to calculate FPKM-UQ from counts matrix?

jonesse3 commented 4 weeks ago

@cloudyaaron Can you provide what methods you have already tried and the output/errors from running the tool? Are you expecting a certain primary tumor type from running the tool on your samples?

If you haven't already, you can try htseq-tool.

Example commands to calculate FPKM-UQ values with htseq-tool:

Getting the gene lengths:

htseq-tools gene_lengths --gtf_file CanFam3.1.104.gtf --out_file gene_lengths.csv

Getting the FPKM and FPKM-UQ values:

htseq-tools fpkm --aggregate_length_file gene_lengths.csv --htseq_counts HTseq_Glioma/SRR10362449_Glioma_Counts.txt --output_prefix FPKM_UQ/SRR10362449.Glioma

cloudyaaron commented 3 weeks ago

I haven't try htseq-tool yet I use featurecount to get original read counts and then calculate the FPKM-UQ value by the method in GDC documentation I will try htseq-tool and let you know the results. Currently, I'm not getting any errors, the results just shows poor prediction score