Closed allyhawkins closed 2 years ago
I went ahead and removed the color from the bars above the plots so that there's no more combining of plots and plots should now be consistent across all of them. I also went ahead and made the minor edits to the aws s3 cp
statement for the metadata and removed the extra filtering step for the rowdata. I am now only filtering for genes that are detected in > 5% of cells and then dropping any genes that are not found in both Alevin-fry and Cell Ranger using drop_na()
when spreading the mean values into individual columns, rather than using the additional step.
The formatting looks good to me, but before I approve, I want to note that the change from the previous version to this one in the correlation plot is quite substantial. I am not sure I understand why that might have happened. I would not have expected a change based on my understanding of the transformations that were done, but it looks like the lowest expression genes are now being excluded?
So I also noticed this and after going through it multiple times and trying to figure out where there could be a difference in the two different methods it looks like there was an error with the original plot and we weren't removing the low covered genes, when that should have been happening. I went through and triple checked and with removing the low covered genes (by filtering out genes with detection of <5% of cells) the correlation plots should look like the ones that are now committed. This is the case if I have the additional step of filtering by genes found in both tools prior to making the spread out dataframe or not. I will go ahead and adjust the size to match the other graphs though and then will add the new plot.
Based on feedback received in AlexsLemonade/scpca-docs#21, I am breaking out the script used to create the figures needed for the FAQ: Why did we use Alevin-fry? I also am storing the figures here and will then include a permalink to the figures in the docs, rather than move them over to the
scpca-docs
repo.This script is hard-coded to take as input the results from previous benchmarking analysis that we've done with Alevin-fry using
cr-like
with selective alignment and Cell Ranger for two single-cell and single-nuclei samples. As part of the previous benchmarking analysis these sce objects were generated usinganalysis/quantifier-comparisons/benchmarking_generate_qc_df.R
.I'm creating three plots to be used to compare Alevin-fry to Cell Ranger, two density plots of the distribution of UMI/cell and genes detected/cell and a scatter plot showing the correlation of mean gene expression between the two tools in each sample. After talking with Josh about the plots, we decided to switch to using a density plot to better compare the two distributions and then also show it on log-scale. Additionally, I changed the plots to be labelled with the library ID and what type of sample (cell or nucleus).
I'm attaching the files here for easy review: