AlexsLemonade / scpca-docs

User information about ScPCA processing
https://scpca.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
0 stars 1 forks source link

Add download files section #24

Closed allyhawkins closed 3 years ago

allyhawkins commented 3 years ago

Here I am starting to add in the text for the files download section to address #6. I included an overview of what information is included in each download and then broke it down into sections based on the four files that we are including - unfiltered counts matrix, filtered counts matrix, qc report, and then metadata.

While writing this, I thought it would be helpful to include information on how to actually read in the RDS files and access the different pieces of the SCE (such as the additional metadata that we've included). I went ahead and started to add a FAQ for how to use these files in R here to address this.

Additionally there were a few questions that came up while I was doing this that I would appreciate some feedback on:

  1. Right now we are including the library_metadata.json file for every library in the download, based on what's in the google doc, but is this what we want to be doing? I'm not sure what added information this is giving to the user beyond the metadata available in libraries_metadata.csv and the metadata in the SCE object, so I don't know if this file needs to be user facing.

  2. I wasn't quite sure the level of detail to include here about the QC report. I kept it fairly simple to only describe what type of file it was and the general information that was present, but do we want a more in depth explanation of each of the plots here? or somewhere else in the docs?

Also, there are a few points where I included links to other places in the docs, but I don't think I can actually include the links until we have RTD setup, so I put in placeholders for now. Let me know if this isn't the case and I can fix this.

jashapiro commented 3 years ago
  1. Right now we are including the library_metadata.json file for every library in the download, based on what's in the google doc, but is this what we want to be doing? I'm not sure what added information this is giving to the user beyond the metadata available in libraries_metadata.csv and the metadata in the SCE object, so I don't know if this file needs to be user facing.

I am not sure what the full content of libraries_metadata.csv will be. I assume that will have all of the sample info as well as the particular libraries, in which case I would agree that the metadata.json files may be redundant. But I think that there are fields in that file that might not be fully represented in the csv file or the metadata of the individual files. I would not want that data to only be in the metadata of the SCE objects or QC report, as that is a bit heavy to parse if someone is only interested in looking at some individual fields. So I would lean toward keeping them, as they are quite easy to open and read on a case by case basis.

2. I wasn't quite sure the level of detail to include here about the QC report. I kept it fairly simple to only describe what type of file it was and the general information that was present, but do we want a more in depth explanation of each of the plots here? or somewhere else in the docs?

I hope we don't need much about it. The goal is for it to be pretty self-explanatory. If people have to go to the docs to figure out what we wrote in the QC report, I would be worried.

allyhawkins commented 3 years ago

@jashapiro Thanks for the helpful comments! I went through and I believe I have incorporated most of them in addition to some points we discussed in our chat this afternoon.

I believe this was everything, but let me know if I missed anything that we talked about.

kurtwheeler commented 3 years ago

I'll need to make a few tweaks to the code again but all the data I need for them is there, should be easy.

I listed all columns that should be found in every sample but did not include any columns that are project-specific additional metadata.

I had included project-specific fields before, but I can remove them and make what you listed what they get.

allyhawkins commented 3 years ago

I had included project-specific fields before, but I can remove them and make what you listed what they get.

No you should be including the project specific fields, I just didn't include them in the documentation here because it would be too much to write out the description of every column for every project. I just included a description for the columns that are sure to be found in every single file.

allyhawkins commented 3 years ago

@jashapiro I went ahead and incorporated your comment. I also moved the link to the readRDS FAQ to be after the gene expression file contents link, made a few minor wording changes to go along with your suggestions, and then fixed the header inconsistencies. I also found in the FAQ on how to read in the RDS files two sentences that seemed to provide duplicate information so I combined them into one sentence. Let me know if you notice any other issues!