KevinMenden / scaden

Deep Learning based cell composition analysis with Scaden.
https://scaden.readthedocs.io
MIT License
72 stars 26 forks source link

Bulk simulation #22

Closed Kai6662 closed 4 years ago

Kai6662 commented 4 years ago

Hi,

I processed my scRNA-seq dataset(s) that I want to use for training. I used Seurat for this and got celltype labels. Then I created two input files( _norm_counts_all.txt for the count data, _celltypes.txt for the cell type labels ). But when I use bulk_simulation.py to do Bulk simulation. It have an error : IndexError: list index out of range. But I don't think my files have problems. What is the problem?

KevinMenden commented 4 years ago

Hi,

could you please provide the exact error message that you get?

And, if possible, the head of your celltype labels file and your count table - so I can check whether they fulfill the requirements.

Kevin

Kai6662 commented 4 years ago

Hi,

Please see the attached files. Thank you.

Best regards,

KaiK new.single.cell.expressioncelltypes.txt <https://drive.google.com/file/d/1Knsicn2FtCSyAR-MyJCW7d33KLYNNvH/view?usp=drive_web> new.single.cell.expression_norm_counts_all.txt https://drive.google.com/file/d/1_zbC8ETHoptvEtjT8U4obGwQ4gO_q2EM/view?usp=drive_web

On Thu, Jan 9, 2020 at 8:45 AM Kevin Menden notifications@github.com wrote:

Hi,

could you please provide the exact error message that you get?

And, if possible, the head of your celltype labels file and your count table - so I can check whether they fulfill the requirements.

Kevin

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/KevinMenden/scaden/issues/22?email_source=notifications&email_token=AMSN33NSDTQQK4B6N4U6HMDQ43IYJA5CNFSM4KELHMI2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIPKPSA#issuecomment-572434376, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMSN33NGTBZS46MN6MZYEBDQ43IYJANCNFSM4KELHMIQ .

KevinMenden commented 4 years ago

Hi Kai,

thanks for sending those through. You just have to transpose your expression matrix to the form rows = samples columns = genes and then it should work.

However, I found a small bug in the bulk_simulation.py script while testing your data, which I thought I had fixed actually... I quickly fixed it, so if you just clone the repository and use the corrected version of the 'bulk_simulation.py' script, it should be working fine!

And for you're convenience, here's a link to the transposed expression matrix, which works together with your cell type labels (you'll have to change the name again, of course). https://drive.google.com/file/d/1qDitSaHb2nAkLHmDe6Ad5TGQkLYgjzq1/view?usp=sharing

Hope that solves things!

Kevin

Kai6662 commented 4 years ago

Hi Kevin,

I tried. I clone the repository and try it again. It still have problem~~ " python /hpc/dhl_ec/kcui/scaden/scaden/preprocessing/bulk_simulation.py --cells 100 --samples 50 --data /hpc/dhl_ec/kcui/deconvolution/2.Scaden/plaque/only_plaque Datasets: [] Traceback (most recent call last): File "/hpc/dhl_ec/kcui/scaden/scaden/preprocessing/bulk_simulation.py", line 281, in all_genes = get_common_genes(xs, type='intersection') File "/hpc/dhl_ec/kcui/scaden/scaden/preprocessing/bulk_simulation.py", line 209, in get_common_genes com_genes = genes[0] IndexError: list index out of range"

Best regards, Kai

On Thu, Jan 9, 2020 at 11:42 AM Kevin Menden notifications@github.com wrote:

Closed #22 https://github.com/KevinMenden/scaden/issues/22.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/KevinMenden/scaden/issues/22?email_source=notifications&email_token=AMSN33MKEJWQM3COVBGXKPTQ435SVA5CNFSM4KELHMI2YY3PNVWWK3TUL52HS4DFWZEXG43VMVCXMZLOORHG65DJMZUWGYLUNFXW5KTDN5WW2ZLOORPWSZGOV3UFKVI#event-2934461781, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMSN33NF4KR5BIYLDSI2AVTQ435SVANCNFSM4KELHMIQ .

KevinMenden commented 4 years ago

The problem is that the code can't find the datasets you're using. You see this here: Datasets: [] The empty brackets mean it could not find a usable dataset.

The program is looking for files in the directory you give it, that match the pattern that you specific (by default its '*_norm_counts_all.txt'. Maybe try adding a '/' to the end of your directory:

--data /hpc/dhl_ec/kcui/deconvolution/2.Scaden/plaque/only_plaque/'

Kai6662 commented 4 years ago

Hi Kevin,

It is working now! Thanks for your help.

Best regards, Kai

On Thu, Jan 9, 2020 at 2:29 PM Kevin Menden notifications@github.com wrote:

The problem is that the code can't find the datasets you're using. You see this here: Datasets: [] The empty brackets mean it could not find a usable dataset.

The program is looking for files in the directory you give it, that match the pattern that you specific (by default its '*_norm_counts_all.txt'. Maybe try adding a '/' to the end of your directory:

`--data /hpc/dhl_ec/kcui/deconvolution/2.Scaden/plaque/only_plaque/'

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/KevinMenden/scaden/issues/22?email_source=notifications&email_token=AMSN33NZH52AUYM6SB2QB2DQ44RCDA5CNFSM4KELHMI2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIQJOJA#issuecomment-572561188, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMSN33PUF7JJUPPL5F45C4LQ44RCDANCNFSM4KELHMIQ .

KevinMenden commented 4 years ago

Great :-)

No problem!

Kai6662 commented 4 years ago

Hi,

I am using your software. But the results from Scaden and MuSiC display a big difference. And I am looking for a way to measure the results. Why it is so different? Do you have any idea about that?

Best regards, Kai

[image: image.png]

On Thu, Jan 9, 2020 at 8:45 AM Kevin Menden notifications@github.com wrote:

Hi,

could you please provide the exact error message that you get?

And, if possible, the head of your celltype labels file and your count table - so I can check whether they fulfill the requirements.

Kevin

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/KevinMenden/scaden/issues/22?email_source=notifications&email_token=AMSN33NSDTQQK4B6N4U6HMDQ43IYJA5CNFSM4KELHMI2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIPKPSA#issuecomment-572434376, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMSN33NGTBZS46MN6MZYEBDQ43IYJANCNFSM4KELHMIQ .

KevinMenden commented 4 years ago

Hi Kai,

it's of course not quite reassuring to get different results from two algorithms. With MuSiC, we observed that it sometimes gives quite significantly wrong predictions, and we didn't know why. However, that's not to say that it doesn't work, as it achieved quite good performance on other datasets - similar to Scaden.

Of course I am biased and am inclined to say you can trust Scaden :) but in this case, I would actually use CIBERSORTx and see what kind of results this gives. If it is similar to one of the other algorithms, then that's probably the best prediction.

Hope that helps!

Best, Kevin

Kai6662 commented 4 years ago

Hi Kevin,

Thank you so much!

Best, Kai

Sent from my iPhone

On Mar 11, 2020, at 11:15 AM, Kevin Menden notifications@github.com wrote:

Hi Kai,

it's of course not quite reassuring to get different results from two algorithms. With MuSiC, we observed that it sometimes gives quite significantly wrong predictions, and we didn't know why. However, that's not to say that it doesn't work, as it achieved quite good performance on other datasets - similar to Scaden.

Of course I am biased and am inclined to say you can trust Scaden :) but in this case, I would actually use CIBERSORTx and see what kind of results this gives. If it is similar to one of the other algorithms, then that's probably the best prediction.

Hope that helps!

Best, Kevin

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.