AlexsLemonade / sc-data-integration

0 stars 0 forks source link

Plot within batch ARI #211

Closed allyhawkins closed 1 year ago

allyhawkins commented 1 year ago

Closes #203

Here I added a plot for calculating the within batch ARI to the integration report. To be able to add this in I had to also add a way to first calculate the within batch ARI which requires having the individual SCE objects for all the libraries present in the integrated objects being evaluated. To do that I decided to add the library metadata file as a parameter. Then from there we can use the integration_input_dir column to define the directory that holds the original SCE objects and find them by filtering to the library IDs that are associated with the params$group being tested. I added the chunk that does the reading in of the individual files and creation of the list of individual SCE files towards the beginning of the notebook, after reading in the integrated data since it seemed to fit with that. Also I added a check to make sure all the expected files are present so I wanted that near the top.

Then after looking at batch and celltype ARI, I added this new ARI where I just made one simple plot with integration method on the x-axis and ari on the y-axis. I didn't really think we needed anything too fancy here. I'm including two examples of testing this with the simulated data.

Another thing I did here was remove the unintegrated option and actually allowed for unintegrated to be an integration_method. It makes being able to apply our function across the sce list that we have in the notebook a lot simpler. If people have strong methods about that I can change it back. I also didn't write a separate function here to calculate and then plot, because it wasn't super complicated. And having to return to this repo after a little break, I like having the code within the notebook rather than trying to parse through layers of functions.

Simulated data with shared cell types image

Simulated data where not all cell types are shared across all batches image

I'm going to tag both @jashapiro and @sjspielman and whoever gets to this first can take the review.

allyhawkins commented 1 year ago

I incorporated most of the suggestions that were made. As far as adding in small markdown comments in front of hidden code chunks, I added a few in front of the larger chunks that are more important but not all of them. I don't know if adding them to all the chunks is necessary for internal reports like this. This should be ready for another look.

allyhawkins commented 1 year ago

I also didn't write a separate function here to calculate and then plot, because it wasn't super complicated. And having to return to this repo after a little break, I like having the code within the notebook rather than trying to parse through layers of functions.

This was part of the reason I didn't write a function for this here. Because right now we aren't using it anywhere else and I found returning back to this a lot to get through the layers of running and plotting functions we have. I made the change to use aes instead though. If you would prefer a function then I can do that, but just wanted to note that I was able to make the change without needing it.