biocore / LabControl

lab manager for plate maps and sequence flows
BSD 3-Clause "New" or "Revised" License
2 stars 15 forks source link

Change sample sheet Sample_ID and Sample_Name to use actual sample id instead of sample "content"? #237

Open AmandaBirmingham opened 6 years ago

AmandaBirmingham commented 6 years ago

I have split this issue off of #204 because I think it may need more discussion before implementation.

https://github.com/jdereus/labman/issues/204#issuecomment-384009022
"@tanaes Also, this question is not directly related to producing the project shortname, but I just want to double-check: what you want as the Sample_ID, Sample_Name, etc, in the shotgun sample sheet are strings that are the sample ids plus the plate and well they were plated on in the original sample plate? As in, "1_SKB1_640202_21_A1" (where the actual sample_id in qiita.study_sample is "1_SKB1_640202")?"

https://github.com/jdereus/labman/issues/204#issuecomment-384010632 "We don't actually need to have the plate info -- just the study + sample identifier is ok, i don't want to encode extraneous data in the filename. TBH I'd rather have a non-human-readible unique identifier but I don't think that will work in our system."

https://github.com/jdereus/labman/issues/204#issuecomment-384014159 "@tanaes Just to be clear, my question above is about the contents of the "Sample_ID" and "Sample_Name" columns in the shotgun sample sheet that LabPerson generates; as far as I know, these values aren't file names (or are you saying they are used as that, somewhere downstream)? As I said, this question strays a bit from the task of generating the project shortname; sorry :)

I just wanted to double-check that, whatever you use the sample sheet for after getting back the sequencing data, you actively want these "sample id plus position" descriptors in it rather than the actual keys that would allow you to, say, look up the sample metadata in Qiita (without having to strip off extraneous plate id and well position pieces at the end of the string). If you DO want the "sample id plus position" info (or you just don't care :) ), then all is copacetic. If actual Qiita sample ids would be more useful to you, it would be very easy to put them in the shotgun sample sheet instead of the "sample id plus position" ids."

https://github.com/jdereus/labman/issues/204#issuecomment-384994491 "Sample names Based on how the pipeline currently works, I think providing the actual Qiita Sample IDs to the format_sample_sheet machinery is going to be best. They do end up being munged into filenames (we end up encoding both the Qiita sample ID and the Illumna BCL2Fastq-compatible name in the sample sheet, and the latter is what gets prepended to the fastq filename), but we want to retain the original Qiita ID for when we rename these files later one. Currently this process is all already handled, and so I don't think [sic]"

AmandaBirmingham commented 6 years ago

So, I started to make this change to the code and then realized that the dang blanks/controls/etc would be an issue with this change too, since they don't HAVE Qiita sample ids. If we decide how we'd like those black sheep to be represented, this can probably be done easily.

tanaes commented 6 years ago

fundamentally the problem is that these controls don't exist in the study that's uploaded, correct? they're added later. I don't really know how to decide how to handle this.

@antgonza @ElDeveloper any input?

if they're to be analyzed in Qiita it seems like they would somehow need to be added to the sample sheet somewhere. I don't know how that happens given the current model.

ElDeveloper commented 6 years ago

fundamentally the problem is that these controls don't exist in the study that's uploaded, correct? they're added later.

You are correct.

I think they need to be added and associated when a plate is created, as that's the point where membership is assigned. To do this labman will need to enter the values directly to the qiita database or via a REST API.

josenavas commented 6 years ago

@tanaes you're right - this is just pointing out the flaws present in the Qiita DB structure, as technical replicates are not correctly supported on the Qiita DB. Entering the values directly to the qiita database ends up duplicating a fair amount of code (e.g. all the sample/prep text files need to be regenerated). Doing it via the REST API may be the easiest and most correct way of doing it, since Qiita will still be managing everything that it sees.

AmandaBirmingham commented 5 years ago

Really not sure whether this one needs to be done anymore; would need careful discussion with current stakeholders/dev team before addressing.