Integration step fails - Githubissues

BoxWong commented 1 year ago

Hi Yijia, I've got a problem trying to integrate the reference adata (PBMC data downloaded from your dropbox link) with my own PBMC scATAC-seq data processed with quickATAC while the query data that I downloaded from dropbox went just fine.

So the problem is that the integration failed due to different vars in reference dataset and my own dataset. The code I ran was:

integrated_adata1 = scATAnno_assignment.scATAnno_integrate(reference_data, query_data1, variable_prefix = "Pub", sample_size = 25000) and the error message was: _ValueError: Adatas have different variables. Please specify joinvars='inner' for intersection.

I then tried running quickATAC with the peak files I exported from your reference dataset and generated new output. After trying to integrate the newly generated files with the reference dataset, it still throws the same error:

_ValueError: Adatas have different variables. Please specify joinvars='inner' for intersection.

Since the n_var of my data is smaller than that of the reference dataset, I subsetted the reference data by:

sub_ref_data = reference_data[:, (query_data2.var.index & reference_data.var.index)] and did the integration again: integrated_adata2_subref = scATAnno_assignment.scATAnno_integrate(sub_ref_data, query_data2, variable_prefix = "Pub", sample_size = 25000)

Now the 'Compute similarity matrix' step seems to be fine but the normalization step went wrong again and the error was different from the previous one: PanicException: called Result::unwrap() on an Err value: TooSteep

PS: I would also like to mention that I changed your script (scATAnno_integration.py) and added the parameter "join_vars = 'inner'" to line 44 which caused the problem. Doing this also gave me the same error: PanicException: called Result::unwrap() on an Err value: TooSteep

Sorry for the lengthy text and looking forward to your input.

Best, Xin

Yijia-Jiang commented 1 year ago

Hi Xin, the integration failed since the peaks are not unified for reference and query. We are working on a script for using quickATAC for scATAnno and will post it soon.

Yijia-Jiang commented 8 months ago

Hi Xin, we have uploaded a script to preprocess your PBMC scATAC-seq data (https://github.com/Yijia-Jiang/scATAnno-main/blob/main/prep_data/RunPBMC.sh).

To prepare for the PBMC peak matrix, you will need to use QuickATAC (https://github.com/AllenWLynch/QuickATAC/tree/main/quickatac). Then you can use the script RunPBMC.sh to get matrix for your query scATAC data.

Feel free to let me know if you have any questions.

Yijia-Jiang / scATAnno-main

Integration step fails #2