Estimated frequencies are problematic

Hello, First - awesome tool! Thank you very much. I ran into a problem which I'm not sure how to solve, would be great to have your advice, I'm using an RNAseq data (PBMCs) in which the data reported are presumably FPK (unfortunately - can't know for sure). I performed deconvolution using your R tool and the results weren't that promising (attached here as Result 1. Then, I tried to transform the counts into TPM values by dividing the counts for each subject by the sum of the counts in each subject and multiple by one million. Then, again ran your algorithm and the performance was much better but still not perfect (attached here as Result 2) It can be easily seen that the problems In Result 2 are:

Negative values
Values over 100
Summing up the frequencies to values much higher than 100

Would be great to have your advice

Hi! Thank you for trying out the tool! Yes, the results do not look great. In Results 1 the overall deflated and in Results 2 are overall inflated. However, even if the would be in a compatible scale, it seems that you would still have more basophils than any other cell type, which is very weird unless is caused by a particular disease. Do you think your PBMCs samples are contaminated somehow? Otherwise, to reduce technical variability you should pre-process from scratch your fastq files. You should run kallisto on them and then tximport.

Best, Gianni

On Tue, 28 Jul 2020 at 15:39, erikfel97 notifications@github.com wrote:

Hello, First - awesome tool! Thank you very much. I ran into a problem which I'm not sure how to solve, would be great to have your advice, I'm using an RNAseq data (PBMCs) in which the data reported are presumably FPK (unfortunately - can't know for sure). I performed deconvolution using your R tool and the results weren't that promising (attached here as Result 1. Then, I tried to transform the counts into TPM values by dividing the counts for each subject by the sum of the counts in each subject and multiple by one million. Then, again ran your algorithm and the performance was much better but still not perfect (attached here as Result 2) It can be easily seen that the problems In Result 2 are:

Negative values

Values over 100

Summing up the frequencies to values much higher than 100 [image: image] https://user-images.githubusercontent.com/42279845/88672645-5c732580-d0f0-11ea-8356-f6b110b9144b.png

Would be great to have your advice

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/giannimonaco/ABIS/issues/11, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC2UTECGRJDICAUHD7PELTDR53IHXANCNFSM4PKQ7UJQ .

giannimonaco / ABIS

Estimated frequencies are problematic #11