giannimonaco / ABIS

57 stars 16 forks source link

Estimated frequencies are problematic #11

Closed erikfel97 closed 3 years ago

erikfel97 commented 4 years ago

Hello, First - awesome tool! Thank you very much. I ran into a problem which I'm not sure how to solve, would be great to have your advice, I'm using an RNAseq data (PBMCs) in which the data reported are presumably FPK (unfortunately - can't know for sure). I performed deconvolution using your R tool and the results weren't that promising (attached here as Result 1. Then, I tried to transform the counts into TPM values by dividing the counts for each subject by the sum of the counts in each subject and multiple by one million. Then, again ran your algorithm and the performance was much better but still not perfect (attached here as Result 2) It can be easily seen that the problems In Result 2 are:

  1. Negative values
  2. Values over 100
  3. Summing up the frequencies to values much higher than 100 image

Would be great to have your advice

giannimonaco commented 4 years ago

Hi! Thank you for trying out the tool! Yes, the results do not look great. In Results 1 the overall deflated and in Results 2 are overall inflated. However, even if the would be in a compatible scale, it seems that you would still have more basophils than any other cell type, which is very weird unless is caused by a particular disease. Do you think your PBMCs samples are contaminated somehow? Otherwise, to reduce technical variability you should pre-process from scratch your fastq files. You should run kallisto on them and then tximport.

Best, Gianni

On Tue, 28 Jul 2020 at 15:39, erikfel97 notifications@github.com wrote:

Hello, First - awesome tool! Thank you very much. I ran into a problem which I'm not sure how to solve, would be great to have your advice, I'm using an RNAseq data (PBMCs) in which the data reported are presumably FPK (unfortunately - can't know for sure). I performed deconvolution using your R tool and the results weren't that promising (attached here as Result 1. Then, I tried to transform the counts into TPM values by dividing the counts for each subject by the sum of the counts in each subject and multiple by one million. Then, again ran your algorithm and the performance was much better but still not perfect (attached here as Result 2) It can be easily seen that the problems In Result 2 are:

  1. Negative values
  2. Values over 100
  3. Summing up the frequencies to values much higher than 100 [image: image] https://user-images.githubusercontent.com/42279845/88672645-5c732580-d0f0-11ea-8356-f6b110b9144b.png

Would be great to have your advice

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/giannimonaco/ABIS/issues/11, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC2UTECGRJDICAUHD7PELTDR53IHXANCNFSM4PKQ7UJQ .