almaan / stereoscope

Spatial mapping of cell types by integration of transcriptomics data
MIT License
87 stars 24 forks source link

how does stereoscope eventually deceide the input genes #12

Closed BioAmelie closed 3 years ago

BioAmelie commented 4 years ago

Hi @almaan,

I used -gl specify a custom genes list which includes 5459 genes. But stereoscope report the error ValueError: Shape of passed values is (5459, 8), indices imply (5461, 8). This is my run code, stereoscope run --sc_cnt ../3170_st/All_ko3_celltype_cnt.tsv --sc_labels ../3170_st/All_ko3_celltype_meta.tsv -o marker --st_cnt ../3170_st/st_cnt.tsv --gpu -mc 10 -stb 2048 -scb 2048 -gl ./celltype_marker.txt -sce 75000 -ste 75000 . Is this caused by the mc parameters? I wonder whether stereoscope will filter genes based on gl list, if so, the input gene number should less that 5459 rather than more than 5459.

almaan commented 4 years ago

Hello,

could you perhaps show me the result of head celltype_marker.txt just so I can see how it is formatted? stereoscope will filter genes based on the list, but only include those present in the data, so this error is a bit puzzling to me.

Best Alma

BioAmelie commented 4 years ago

Hi @almaan,

This is the first ten row of celltype_marker.txt, and I have checked all the genes in the celltype_marker.txt existed in single cell data and ST data.

Trbc2

Cd3d

Cd3g

Trac

Cd3e

Emb

Il7r

Icos

Cd28

Bcl11b

I found stereoscope will successfully run if I delete -mc 10. I have successfully run using this code: stereoscope run --sc_cnt ../3170_st/All_ko3_celltype_cnt.tsv --sc_labels ../3170_st/All_ko3_celltype_meta.tsv -o marker --st_cnt ../3169_st/st_cnt.tsv --gpu -stb 2048 -scb 2048 -gl ../3170_st2/celltype_marker.txt -sce 50000 -ste 50000

almaan commented 4 years ago

Hello, and thanks for the feedback!

Also, great to hear that things are working for you - I would guess that the error this might be a consequence of you having blank rows in the gene list, the list should contain one gene per row, and when blank rows are introduced, this will probably render terrors that propagate through the code!

I would highly recommend to try using a list that looks like:

Trbc2 Cd3d Cd3g Trac Cd3e Emb Il7r Icos Cd28 Bcl11b

and so forth, rather than the one you are currently using - I'm not sure removing `-mc 10' actually fixes the problem completely and some weird effects might be observed.

I will try to push some code later on that removes blank lines automatically, good that this issue came to light!

BioAmelie commented 4 years ago

Hi @almaan,

celltype_marker.txt does not have blank lines. It is my spelling error.

BioAmelie commented 4 years ago

Hi almaan,

I do not know how to send you figures on github, so I write my question in a PDF file. I have uploaded it. I really appreciate your help.

Your sincerely,

minfang

------------------ 原始邮件 ------------------ 发件人: "almaan/stereoscope" <notifications@github.com>; 发送时间: 2020年7月27日(星期一) 晚上6:53 收件人: "almaan/stereoscope"<stereoscope@noreply.github.com>; 抄送: "Amelie"<amelie@b612.email>;"Author"<author@noreply.github.com>; 主题: Re: [almaan/stereoscope] how does stereoscope eventually deceide the input genes (#12)

Hello, and thanks for the feedback!

Also, great to hear that things are working for you - I would guess that the error this might be a consequence of you having blank rows in the gene list, the list should contain one gene per row, and when blank rows are introduced, this will probably render terrors that propagate through the code!

I would highly recommend to try using a list that looks like:

Trbc2 Cd3d Cd3g Trac Cd3e Emb Il7r Icos Cd28 Bcl11b

and so forth, rather than the one you are currently using - I'm not sure removing `-mc 10' actually fixes the problem completely and some weird effects might be observed.

I will try to push some code later on that removes blank lines automatically, good that this issue came to light!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

BioAmelie commented 4 years ago

Hi @almaan,

I find how to add figures in issues. so I copy my question here. I have successfully run st using the code

stereoscope run --sc_cnt ../3170_st/All_ko3_celltype_cnt.tsv --sc_labels ../3170_st/All_ko3_celltype_meta.tsv -o marker --st_cnt ../3169_st/st_cnt.tsv --gpu -stb 2048 -scb 2048 -gl ../3170_st2/celltype_marker.txt -sce 50000 -ste 50000

But the result is still very odd. In my ST data, there should not have so high proportion of red blood cells. Then I followed your suggestion to check whether it had converged. The following is my result. image

If I use stereoscope progress -lf sc_loss*txt & disown, it will return a number, however, this number does not be the same at twice time.

first time image

second time image

Then I separately use *sc_loss.txt**. It will return a number (not be the same at twice run) and a figure. So I want to know what is the meaning of the number and is it normal I got two different numbers when I run twice the time? From these figures, I think it has converged, am I right? What's more, do you think which parameters I should further tune to get a more reliable results?

stereoscope progress -lf sc_loss.2020-07-23222423.142465.txt & disown

image

stereoscope progress -lf sc_loss.2020-07-25124142.883368.txt & disown

image

almaan commented 4 years ago

Hi @BioAmelie,

I see from you print screens that you have two loss files, meaning that when you glob (i.e., use *) you will feed both these loss files into the progress module, which only expects one file. In other words, when you do stereoscope progress sc_loss*.txt this is equivalent to stereoscope progress sc_loss.2020-07-23222423.142465.txt sc_loss.2020-07-25124142.883368.txt. The number you see is the id of the process you are starting on you computer, which will be change from time to time - it has nothing to do with your results. Furthermore, if I'm not mistaken sc_loss.2020-07-23222423.142465.txt seems to predate the other file, if so, I would maybe look at the most recent one instead. In addition to that, the loss will always look flat when you compare it to the initial values - zoom in on the results (the window is interactive) to see whether the loss function seems to monotonically decrease around 70k iterations (not converged) or whether it rather oscillates around some value (converged).

You say that the results look strange, referring to the amount of blood cells you have - I'm curious whether this is based on you inspecting the estimated proportion values or the visualization of these? For example, if you have very low values for all your blood cells in each of the spots, when you visualize these (and scale the values internally) you will see a signal across the whole tissue, which indicates that they are evenly distributed in your tissue - but not that they are "abundant", rather this informs you that the specific cell type has equally low presence everywhere in your tissue.

Best Alma