Open kulansam opened 3 weeks ago
Hi,
I would suggest using a preprocessed .csv
file if available. The format of the file is the same as mentioned in the README and I am adding it below as well
gene | uID | absX | absY |
---|---|---|---|
AKAP11 | 2 | -1401.666 | -2956.618 |
SIPA1L3 | 3 | -1411.692 | -2936.609 |
THBS1 | 925 | -764.6989 | -1604.828 |
The four columns are gene
(gene name), uID
(cell ID), absX
(X coordinate), absY
(Y coordinate) and if the data is 3D, you can add absZ
as well.
If you have a file in this format, you can use the obj.load_preprocessed_data()
function. Please let me know if you need any additional help.
Hi,
Thank you for your quick response.
I have my MERFISH data ready after nucleus-level cell segmentation using CellPose. Could you please guide me on how to prepare the specific input file from the output? I am unsure how to obtain the absX (X coordinate), absY (Y coordinate), and uID (cell ID) values.
However, in my case, the detected_transcripts.csv file contains the following information. I am planning to use this file to prepare the input file for InSTAnT run.
index,barcode_id,global_x,global_y,global_z,x,y,fov,gene,transcript_id,cell_id 288,218,11622.136,6416.4404,0.0,1375.147,762.6618,0,genename,ENSMUSTXXXX,3965824400107100661 18,242,11647.205,6354.178,0.0,1607.2699,186.16031,0,genename,ENSMUSTXXXX-1
I am planning to extract the following columns: gene, cell_id (UID), global_x, global_y, and global_z (3D -level). Could you please confirm if this is the correct approach?
Additionally, it would be incredibly helpful if you could provide specific code or a pipeline to help me format this file correctly.
Thank you for your assistance!
Hi,
The file you mention looks correct. You just need to rename the columns accordingly -
There is no additional preprocessing needed on top of this if the column names are correct.
Thank you for your assistance. I have successfully loaded the data; however, I am encountering a memory issue when running the run_ProximalPairs3D() function. Here is the error message I received:
_Running PP-3D now on 4 threads for, 115372 cells, 108834798 transcripts /var/spool/slurmd/job3601435/slurm_script: line 16: 3742652 Killed python instant_colocai.py
slurmstepd-compute-7-6: error: Detected 2 oomkill events in StepId=3601435.batch. Some of the step tasks have been OOM Killed.
I am currently using a system with 100 GB of memory and 4 threads. Could you please help me troubleshoot this problem?
Can you let me know the size of your gene panel as well? Also, is it possible to ask for more memory? I would suggest randomly sampling cells to <20k(I would suggest around 10k) in order to run in a decent time given you only have 4 threads. The algorithm constructs a Cells X Genes X Genes
matrix which is the primary memory consumption source.
Can you let me know the size of your gene panel as well? -315 genes Also, is it possible to ask for more memory? -I have tried with 140GB, still same problem I would suggest randomly sampling cells to <20k(I would suggest around 10k) in order to run in a decent time given you only have 4 threads.
i may missed some of the co-localization gene pairs right?
This would be an issue in a diverse tissue like brain. Can I know which tissue are you running on? If cell types are annotated, you can sample based on cell type and get around 20k cells which should allow you to run in 100GB.
Can I know which tissue are you running on?
@kulansam, Depending on the question you're asking, sampling may not be an issue. For example, if you're interested in d-colocalization (global co-localization), you should mostly recover colocalizing gene pairs. This is because d-colocalization is robust to the number of cells. You can test it yourself, by first running on a sample of 10k cells, then running on a sample of 5k cells (subset of 10k sampled cells) and obtain False positives and false negatives. The signal you may miss is colocalization specific to a rare cell type. If you want to also recover colocalization from rare cell types, you should sample based on cell type.
Thank you! I am currently running the algorithm by downsampling cells based on the expression of specific genes, which significantly reduces the sample size. However, when I run the algorithm, I encounter the following error:
Running PP-3D now on 4 threads for, 12615 cells, 2366637 transcripts /home/spatial/envs/instant/lib/python3.10/site-packages/anndata/_core/anndata.py:183: ImplicitModificationWarning: Transforming to str index. warnings.warn("Transforming to str index.", ImplicitModificationWarning) Cell-wise Proximal Pairs Time : 90.4 seconds STEP2: run_ProximalPairs3D DONE Running Global Colocalization now on 4 threads Number of cells: 106088, Number of genes: 9 Global Colocalization initialized .. Low Precision Global Colocalization Time: 19.41 seconds Traceback (most recent call last): File "/home//spatial/envs/instant/lib/python3.10/site-packages/pandas/io/excel/_base.py", line 1153, in new engine = config.get_option(f"io.excel.{ext}.writer", silent=True) File "/home//spatial/envs/instant/lib/python3.10/site-packages/pandas/_config/config.py", line 272, in call return self.func(*args, **kwds) File "/home//spatial/envs/instant/lib/python3.10/site-packages/pandas/_config/config.py", line 146, in _get_option key = _get_single_key(pat, silent) File "/home//spatial/envs/instant/lib/python3.10/site-packages/pandas/_config/config.py", line 132, in _get_single_key raise OptionError(f"No such keys(s): {repr(pat)}") pandas._config.config.OptionError: No such keys(s): 'io.excel.csv.writer'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/lab-share/Cardio-Chen-e2/Public//colocalization_instant/instant_colocai.py", line 31, in
My code is below: sample_name='cntrl' obj.load_preprocessed_data(data = "./"+str(sample_name)+"processed_instant_detected_transcripts.csv") print ("STEP1: DATA LOADED DONE") obj.run_ProximalPairs3D(distance_threshold = 4, min_genecount = 20, pval_matrix_name = str(sample_name)+"_pvals.pkl", gene_count_name = str(sample_name)+"_gene_count.pkl") print ("STEP2: run_ProximalPairs3D DONE")
obj.run_GlobalColocalization( high_precision = False, alpha_cellwise = 0.05, glob_coloc_name = str(sample_name)+"global_colocalization.csv", exp_coloc_name = str(sample_name)+"expected_colocalization.csv", unstacked_pvals_name = str(sample_name)+"unstacked_global_pvals.csv") print ("STEP3: obj.run_GlobalColocalization DONE")
a_reordered1=a_reordered[['uID', 'absX','absY','absZ']]
a_reordered1.to_csv(str(sample_name)+"_cells_locations.csv",index=False,header=True)
print ("STEP4: obj.run_spatial_modulation DONE")
obj.run_spatial_modulation(str(sample_name)+"_cells_locations.csv", inter_cell_distance = 100, spatial_modulation_name = str(sample_name)+"_spatial_modulation.csv")
print("ALL STEPS DONE")
~
After successfully running the obj.run_ProximalPairs3D() function, I encountered an error while executing the obj.run_GlobalColocalization() function. Could you please help me resolve this issue?
If you change the following line, it should work
unstacked_pvals_name = str(sample_name)+"unstacked_global_pvals.csv")
to
unstacked_pvals_name = str(sample_name)+"unstacked_global_pvals.xlsx")
Hi,
Thank you for developing the InSTAnT package for exploring the patterns of gene-pair co-localization in spatial transcriptomics data.
I have a MERFISH dataset and would like to use this software to detect the co-localization patterns of genes. However, when I attempt to import my data using the 'obj.preprocess_and_load_data' and 'obj.load_preprocessed_data' functions, I encounter an error.
For example, when I run the following command: obj.preprocess_and_load_data(expression_data='./detected_transcripts.csv', barcode_data='./codebook.csv') I receive the following error: KeyError: "['bit_barcode'] not in index"
Could you please let me know how to resolve this error? It would be greatly appreciated if you could provide guidance on how to prepare the input tables for running InSTAnT.
Also, can you let me know the content/information of processed file (example file in your tutorial: obj.load_preprocessed_data(data = f'data/u2os_new/data_processed.csv')?
Thank you for your assistance!