EstherPlomp / TNW-RDM-101

Self paced materials of the RDM101 course
https://estherplomp.github.io/TNW-RDM-101/
Creative Commons Attribution 4.0 International
6 stars 2 forks source link

Assignment 1 Maurits Houmes #39

Closed mausi122 closed 1 year ago

mausi122 commented 1 year ago

Introduction

Hi all, I'm Maurits Houmes a 3rd year PhD student at QN. I have a dog and more hobbies/interests than I have time for.

Describe your research in 2-3 sentences to someone that is not from your field (please avoid abbreviations)

My research revolves around investigating material properties of two-dimensional materials trough nanomechanical means. We do this by creating nanodrums of the materials of interest and than looking at the resonance frequencies of these drums. Since the resonance frequency of these systems are sensitive to a lot of different things we are able to investigate the material properties, the big challenge with this is to disentangle the different effects that effect the resonance frequency.

My research entails the following aspects:

Research Aspect Answer
Use/collect personal data (health data, interviews, surveys) No
Use/collect experimental data (lab experiments, measurements with instruments) Yes
Collaborate with industry Maybe
Write/develop software as the main output of the project No
Use code (as in programming) for data analysis Yes
Work with large data (images, simulation models) Yes
Other: N/A

Reflections on the importance of RDM videos

The points made to support RDM seem like open doors as most of it was already thought to me in my Bsc., but since starting my Phd it has become clear to me that although most people agree this is a good idea in reality it is not always implemented. A timely example is a question I got last week from a colleague who wanted to use some data measured in 2016 (so long before I got here) on a set up I'm currently working on. After some searching we figured out the only place the data was stored is some old discussed PC that was left in the cupboard in the lab that I only happened to know existed because of having come across it looking for something else once. I've tried pointing out to my PI's some issues with the way we do it now but they keep saying I shouldn't waste my time on improving it.

What would you like to learn during this course?

(See above reflection) I'm very aware that the current way we store our data and handle our code in our lab is not very efficient or safe, so I definitly want to improve this but I'm not sure where to start or what would be a good system to set up. Looking around online I almost only find concrete examples of how to deal with it for very large data sets, very different type of data, personal data or focused on code. Non of which I have been able to map on to my situation, so I'm hoping that after this course I have a better idea of how to go about this.

Checklist assignments

EstherPlomp commented 1 year ago

Hi @mausi122 ! Thanks for handing in your assignment 2!

It looks very good and extensive: well done!

I have a couple of comments/suggestions to consider:

mausi122 commented 1 year ago

Hi @EstherPlomp,

Thank you for the feedback.

francescozatelli commented 1 year ago

Hi @mausi122,

The data flow map is very detailed and it looks good! It was interesting to read it because I think we work in similar ways under certain aspects. I also had to deal with some of the challenges you are facing now :)

ArjanMejas commented 1 year ago

Dear Maurits,

That‘s a very well structured review of a rather large and complex set of data.

Best, Arjan

mausi122 commented 1 year ago

Hi @mausi122,

The data flow map is very detailed and it looks good! It was interesting to read it because I think we work in similar ways under certain aspects. I also had to deal with some of the challenges you are facing now :)

  • We also had to backup the measurement data of a dedicated measurement PC. The way we implemented it eventually is to have a .bat script that copies the whole drive of the measurement PC to the U: drive. You can use 'robocopy' for this (it's a feature of Windows, so you don't need to install anything) and you can easily find online how it works and customize it. It's basically just a one line script. Then we use the Task Scheduler of Windows to run this script every hour. Only the changed files are copied, so it's efficient. This has been working quite well so far and it's very easy to implement.
  • For the measurement scripts have you considered using QCoDeS? It's a data acquisition framework that could take care of a lot of these things. With it you can control the instruments you use to run your experiments and store the results in databases. The nice thing is that together with the measurement results, it automatically stores plenty of metadata. For example, you can store as metadata all the parameters of all your instruments so that they can be retrieved later on. I'm not sure if this is applicable to your case, but if the changes in the measurement scripts are really minor it could be an idea to have one general script and store its details as metadata.

Hi @francescozatelli

Thanks for the feedback. I'll definitely check out the 'robocopy' seems like a good solution. We also already where thinking about using QCoDeS but so far haven't implemented it as a lot of the equipment we use has no existing drivers which will be a lot of work to write. We have been using it occasionally on a similar setup which we use a bit more as a test bed so maybe in the future we can replace the scripts with it.

EstherPlomp commented 1 year ago

Thanks all for the replies and helpful input!

* The GitHub/GitLab mention is indeed because I'm not sure witch of these to use. I've some basic experience with GitHub, but from the TU Delft storage solutions I thought that GitLab was the method preferred by the university. I don't really have any sensitive data/code but also most all the code isn't that useful for people outside our lab either. So I'm not sure which is beter to use and I think any difference would be small.

GitLab is not necessarily the preferred solution by TU Delft - we just have an instance for it that is more secure. But if you're not working with sensitive data and you don't have external collaborators it doesn't matter no. You could try out both and see which one fits better, or just pick one and stick with it :)

* for Image editing I mainly use adobe Illustrator for changing aesthetic parts of figures to fit the publication (change fonts, font size, colours used etc.) any data processing I do using Python scripts before that. I found that it is useful to keep the adobe illustrator .ai files as it allows for easy reuse and small changes of the figures for posters or presentations.

Thanks for your elaboration there!

* I don't really understand the second to last point you give, about the OneDrive/Project Drive: could you maybe eleborate on it? I currently use the personal "OneDrive - Delft University of Technology" account to act as a backup for my laptop mostly. But since this expires when I leave I'm not sure how useful it is for datasets.

Sure! What I meant was that you can also use OneDrive as your 'active' storage solution when you process the data first: after you do not have to use it as much you can then transfer it to the project drive. This is easier when you want to work on multiple devices and/or when you don't have internet and want to work locally. You should indeed not use OneDrive as your long term storage location as the account will indeed expire once you leave the TU Delft. I hope this clarifies things? Please let me know if not!

* By the dynamic flag I mean that these scripts are changed a lot, and the changes are very much on a case by case basis  for each sample or measurement run. So this has the challenge that I want to be able to tell from each dataset which exact version was used, but as they have a lot of very minor variations that change back and forth full versioning will end up with easily 50+ versions per month. And it isn't the case that version 2 would be a improvement over version 1 more of a (temporary) tweak.

That sounds complicated indeed! I'm not sure if I have alternative solutions to QCoDeS (and/or GitHub/Lab). I'll ask some colleagues for advice and see if they come up with anything else!

EstherPlomp commented 1 year ago

Just to add a comment from a colleague I received so far:

If it is a script which has minor changes (typically configuration of measurement devices or experiment parameters) add it to the dataset as a file, but do not commit to git. I consider that a lack of standardisation (which is ok). Ideally you would want to add a standardised config file to the dataset, but in lack thereof a flexible python script is the alternative which I would add to the dataset (also when publishing data).

mausi122 commented 1 year ago

Custom README Template: README - Custom Template.txt This is the template for the most common type of data sets in my project (but there are a lot of different types for which I'll have to modify this. But they'll follow this overall structure.

EstherPlomp commented 1 year ago

Thanks for sharing your README template! I'll have a look later this week!

Another tool/software to try out could be Dataled.

EstherPlomp commented 1 year ago

Well done on assignment 3 @mausi122 ! It again looks great and comprehensive! 👍

Just some small comments from my side:

Data organisation

Documentation

Access

Publication

And I will still have a look at your READme file - apologies for not having looked at it yet!

francescozatelli commented 1 year ago

Hi @mausi122, the readme file looks good. I think you included all of the fundamental information needed. Just a couple comments. Adding some instructions to explain how to read and plot the data (and/or some minimal scripts to do it) could be very convenient for people interested in your data. If I understood correctly, you provide the raw datasets (.mat files) and the plots (.png), but not the scripts to go from one to the other. Another detail that could be relevant to add is an explanation of what data is included or not (is it all the data? If not, what is not reported?)

francescozatelli commented 1 year ago

And here is the feedback for Assignment 3, sorry for the double notification. This also looks great and very detailed! Just a couple comments:

EstherPlomp commented 1 year ago

Thanks for sharing your readme file Maurits! I think you have set things up nicely with the placeholder information - well done!

PabloVelazquezGarcia commented 1 year ago

Hey Maurits, here is my feedback for Assignment 3

Both Esther and Francesco have already talked aboutthe most interesting suggestions. I would like to ask, how do you manage to show to your students how to properly store all the data in such a complex structure? It is definetlly very well organized, but I wonder if it is easy for bachelor and master students to keep up with such a complexity. I try to use very intuitive structures and names for my files but this is not easy when the ammount of data that you create is this big!