Closed claraqin closed 4 years ago
@claraqin Questions on modifying the metadata downloading code:
Sorry for the slow response.
params.R
. You should only need to specify the ones that are mentioned in the arguments of downloadRawSequenceData()
and downloadSequenceMetadata()
@claraqin I created a new version of the metadata downloading function, called downloadSequenceMetadataRev(). Did a bit of testing, but would be great for you and/or others in the group to also test and provide feedback.
Will shift to working on a revised sequence data downloading function that uses neonUtilities.
@claraqin Thanks for the code additions, the new functionality is great! Here's a list of the most recent updates:
@claraqin one other question: at some point do you want to clean up the older versions of functions in utils.R? Would be good to update the name of this function so that it isn't a 'rev' anymore. No rush, just wanted to write it down so we don't forget!
Hi Lee,
Thanks for bringing this up! I'll clean up the older versions in my next commit. And sorry for the slow response – I don't know how to receive notifications for Issue thread replies. I'm making a note to figure that out too.
Clara
I think this has been resolved as of the most recent commit, which gives downloadSequenceMetadataRev
the ability to handle tarballs.
Improvements should be made to the sequence data downloading functions in
utils.R
to make them more robust. Currently they assume a standard file naming structure, which means that they may not be robust to minute changes in naming conventions.Suggestions from @lstanish :
In my experience the best way to ensure you are filtering for the correct sequence data is to start by downloading the metadata and doing some table joining and filtering. Here’s some example code using neonUtilities that combines the respective sequencing data table (16S or ITS) with the raw data files table. What you end up with is a data.frame containing the rawDataFIleNames for just the 16S or ITS data
From here you can subset the data to include only the run or samples of interest.