AlexsLemonade / scpca-docs

User information about ScPCA processing
https://scpca.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
0 stars 1 forks source link

Add FAQ for using data in python instead of R #34

Closed allyhawkins closed 3 years ago

allyhawkins commented 3 years ago

Closes #13. Here I added in a FAQ to address converting the RDS files to a usable format that can be read into python and used with scanpy. After a lot of trial and error and playing with HDF5 files and loom files, I think the most efficient solution is using DropletUtils::write10xcounts() to write the counts matrix into 10x format first and then reading into python (thanks Josh for starting to lead me down this route!). Here we can save each of the counts matrices separately for RNA-seq and CITE-seq and then use scanpy's read_10x_mtx() function and give it the path to the directory where the files that we just wrote out from R and it will read it in and create the AnnData object needed for use with scanpy. This takes < 30 seconds for writing the files and importing the files vs. the 15-30 minutes that it took to both write loom files and then read loom files back in, which was not efficient.

The biggest downfall here vs suggesting the use of loom is that we lose the associated colData and rowData, but I don't think that's too much of a concern as I'm sure people going to the trouble of working with in it python might be recalculating much of that on their own. I also added in a link that provides some suggestions on how to save coldata and rowdata from R and then append it to an AnnData object. Other than that I think this is a fairly straight forward and efficient approach, but if we want to include the loom data option as well I can add that in.

Also noting that this is stacked on #33.

allyhawkins commented 3 years ago

I went ahead and added in a couple sentences on the version that we are using for writing out the files and that the files would not be compressed if using the default version=2. Let me know what you think.