Open claraqin opened 4 years ago
@lstanish has completed 1 and 5 in the above checklist.
Just a thought about the column order in the metadata table: Because the metadata consists of several stackByTable csv's joined together, the columns are primarily organized by which csv they came from. For example, the first several columns all have to do with the raw data files, and the next several have to do with sequencing. We could revise the order in which we join the csv's so that the columns correspond to the order in which processing took place in the lab, i.e. raw data files, then DNA extraction, then PCR amplification, and finally marker gene sequencing. I'm more agnostic as to the order of columns within these broader groupings.
@claraqin qcMetadata function ready for testing! Function is in the code folder. Currently functionality:
To add:
Other functionality that would be good to add:
Other tests to run:
@claraqin @lstanish Tested this function and pushed a small change: the output is now a dataframe, which can be used the same way as the input dataframe.
As you referenced, the QC function cannot handle a test dataset that was generated using targetGene="all." Perhaps in that case, the QC function could use a loop to essentially run twice, creating a ITS output and a 16S output, and combining them back into an "all" dataframe (or outputting both separately in a list format).
@zoey-rw Thanks for making that update to output a dataframe as well as a hard-copy file! It's good to know that's a useful output. I am curious to know how this function will behave if you use the params file to output the QCed data, did you happen to test that?
Regarding making the function useful for targetGene='all', is this a useful feature? I'm wondering because the data need to be parsed by targetGene for dada2 and all of the downstream analyses keep the 16S and ITS data separate. It's definitely possible and wouldn't be hard to allow the function to QC 16S and ITS in the same function call, just wondering whether that's something that users will want to do.
@claraqin added in user option to remove records containing a NEON data flag (any of the qaqcStatus fields, and dataQF)
@claraqin @zoey-rw Made one minor update to the error message if outDir="" and pushed the udpate. Any luck testing and of the un-checked items above?
We need to make the following changes to the workflow, particularly in the Download NEON Data vignette, to prevent QC-related issues from complicating processes downstream.
In addition, @lstanish suggests that it could be good to reorganize the columns in the metadata table so the most important columns come first. What are some columns to put first in the metadata table?