Closed charlesgregory closed 2 years ago
We also shouldn't assume "CLL" to be present in the sample name in the data table. If the "sample" column isn't present in the data file to be converted, then we can assume it is a pivoted table where the sample names to be converted are in the column names.
Before I start editing the script. I need to know all the variables ahead of time like this. Are there any more files we are using as the index? I was under the impression prior there was only one master file of indices. Is this being used to edit other file types besides those that we talked about prior that break the pattern set (there will only be a single column or a header needing conversion but not both)? I know you brought up this idea before with pivot tables in the meeting we three had, but it seemed like given the information at the time it was fine to set a single --column variable. Let me know if I misunderstood.
I also think both of these scenarios you posted commands for are already working with the current script? We should discuss this on Monday.
@charlesgregory Alright, I've actually done some updates on it since I had to add in the correct method for splitting or not splitting in conversion. I've just gone ahead and added in the flags you wanted, but please let me know if there are more specs to include. Otherwise this can likely be closed.
We should expect to always have a "sample" columns in our data file(s) unless it is a pivoted table, but we have no expectations for the index/key file to have specific columns. I think it currently relies on the columns "accession" and "status" being present? So we really need two flags instead of just
--column
or--column
needs to take a parsed value likecolname1=colname2
.samples.xlsx
:In which case we would need to run the script as such: