EuracBiomedicalResearch / batch_centroid

Utility functions to perform batch centroiding of profile mzML files using MSnbase
2 stars 0 forks source link

Folder structure for the profile mzML files #2

Open jorainer opened 6 years ago

jorainer commented 6 years ago

@SiggiSmara , I would prefer to have a less hierarchical structure as possible. But this obviously depends also on the way how the Sciex software stores the data.

I would propose: A) All files in one folder B) Files organized by year and month, e.g. all files measured in May 2018 -> /2018/05, or /2018-05/

SiggiSmara commented 6 years ago

That is the current structure, I.e. year_mo, but the date that is based on are the first 6 digits of all files. Usually this is the same day the data is generated as well. I still have to test if this is true for all batches thoug.

All files in one folder will result in a folder containing somewhere between 20-30.000 files for the wiff files. Not sure how easy it is to access such a folder. I simply don’t have experience with that.

Plus if we include the calibration files that adds another 5-10.000 more. We haven’t talked about these so far but I have an idea that we need to test out. If my hunch is right then they will come in handy in both within and between batch corrections. This is essentially a mixture of very stable analytes (same concentration, no relation to the samples or sample preparation ) that is measured every 3 runs. It could be a way to account for the overall sensitivity of the instrument on the day the batch is run as well as if the instrument sensitivity is changing during the batch.

jorainer commented 6 years ago

Wow! Yes, these calibration files sound pretty interesting! We will need them too! Regarding files, I am absolutely fine with the year_mo. Let's go for that. I did also talk to our data manager (Hagen). He is fine with any solution we come up. I did also try to talk him into having a central database with all sample-to-file associations. Also Fuxi would need that. I think it would be nice if we could come up with a general (simple) solution here.

Btw, my tests on the cluster are, well, not working that well, because Eva uses all of the nodes at the moment, and her jobs also need I/O. So, bad timing for benchmarks...

SiggiSmara commented 6 years ago

Btw, my tests on the cluster are, well, not working that well, because Eva uses all of the nodes at the moment, and her jobs also need I/O. So, bad timing for benchmarks...

Always the trouble makers these bioinformaticians 😉 The bottleneck right now seems to be the wiff to profile conversion anyway which will take a week or more (worst case scenario) anyway so we can wait a little bit.

SiggiSmara commented 6 years ago

I'll take a look at the calibration file folder in terms of the files and such. Not sure if they are on the bbnas as we thought previously they would not be that important.