aifimmunology / MOCHA

R package for single-cell Open Chromatin Identification & Downstream Analysis
https://aifimmunology.github.io/MOCHA/
GNU General Public License v3.0
2 stars 0 forks source link

Enhancement: Clean up data input #119

Open markphillippebworth opened 1 year ago

markphillippebworth commented 1 year ago

Right now, file requires a list of samples and cell types with specific seperator characters for each.

Natively, everyone (Signac, SnapATAC, etc..) will have a list of fragments.tsv.gz, and a metadata list of cell names, sample, and cell type. Even for ArchR, we have that.

It isn't reasonable for the user to sort the GRanges list by sample and cell type and seperate them with our specific characters. It is reasonable for them to read those fragments.tsv.gz files into R as a GRangesList. All they need to do is make sure that the names of the GRangesList match samples in the metadata, and that cell names match as well. -- And we can test this and throw an error that is informative.

Instead, the user should pass a GRangesList where each index is one sample (mixed cell types), and a metadata object to parse out cell name and cell type.

We check to make sure the sample names align, and the cell names align between GRanges and metadata.

We do this already for ArchR within getPopFrags (or at least we did) so we don't need to write new code from scratch.