Assignment 1 Daniel Muis

dmuis commented 1 year ago

Introduction

Hello everyone, my name is Daniel Muis and I'm a first year PhD student at the Faculty of Applied Sciences. I work at the quantum nanoscience department, at the Kuipers lab specifically. My number one hobby at the moment is running! I ran the marathon of Rotterdam last month and are planning to participate in many more, also international ones! ### Describe your research in 2-3 sentences to someone that is not from your field (please avoid abbreviations) During my research I aim to fabricate a photonic device that consists of a monolayer of tungsten disulfide on a suspended quantum valley-hall photonic crystal. Specifically, I will measure the near-field optical spin of this photonic device to investigate the light-matter interaction of the topological edge polariton. ### My research entails the following aspects:

Research Aspect	Answer
Use/collect personal data (health data, interviews, surveys)	No
Use/collect experimental data (lab experiments, measurements with instruments)	Yes
Collaborate with industry	No
Write/develop software as the main output of the project	No
Use code (as in programming) for data analysis	Yes
Work with large data (images, simulation models)	Yes
Other:	N/A

Reflections on the importance of RDM videos

I think data reproducibility is extremely important, even though I'm not giving it enough attention yet. I try to learn myself the habit of storing all information directly as if I had to explain it to a fellow researcher one hour after. This mostly means writing everything down in my lab journal, to the tiniest detail, and weekly put findings in powerpoint slides to share with my group. I don't have one specific horror story, but one thing I learned from is that when you produce data or figures from many-parameter simulations, store them all and also document with text what you did and why you did it. Too many times I had to do the exact same simulation over again because I forgot what parameters I used...

What would you like to learn during this course?

Normalize backing up data
Make data shareable and interpretable online (but private, to share with collaborators)

Checklist assignments

[x] Assignment 1: creating a GitHub issue (before Class 1)
[x] Respond to the GitHub issue on 'data challenge'
[x] Assignment 2: Data Flow Map 1 (share a link in this issue before 17 May 13:00). Link: https://surfdrive.surf.nl/files/index.php/s/dZ87V7hvasO1ov0
[x] Provide feedback to at least one Assignment 2 from another participant
[x] Respond to the GitHub discussion on 'licenses'
[x] Respond to the GitHub discussion on 'folder structure'
[x] Respond to this GitHub issue with your readme file:
[x] Provide feedback to at least one readme file from another participant
[x] Assignment 3: Data Flow Map 2 (share a link in this issue before 31 May 13:00). Link: https://surfdrive.surf.nl/files/index.php/s/f6hqUjf2rCHVKI4
[x] Provide feedback to at least one Assignment 3 from another participant
[x] Assignment 4: Data Management Plan (before Class 2)
[ ] Respond to the GitHub discussion on 'Data Management Plans' if you have any questions (optional)
[x] Assignment 5: Data Flow Map 3: submit your slide (before Class 2)

EstherPlomp commented 1 year ago

Hi @dmuis! Thanks for handing in your assignment 2!

It looks very extensive, well done! I don't have a lot of feedback, just a couple of thoughts:

Good idea to do manual backups immediately so that there's less chance to forget things! I also like how you mention 4TU.ResearchData as a back up of the publicly shared results - this is indeed the added benefit of sharing your data!
As also mentioned to others working with images/Python: the eScience Center organises a workshop on 'Image Processing with Python' every now and then, which might be of interest to you.
To manage your scripts you may also consider using either GitHub or GitLab. TU Delft has a GitLab instance that they host, which makes it more secure for more sensitive scripts (which I don't think is the case for you because I don't see any red flags for commercially sensitive data?). On the other hand, if you want to collaborate with others, GitHub is easier to use since everyone can make an account and jump in.

mausi122 commented 1 year ago

Hi @dmuis,

I think it looks good. The only feedback I have is that I understand you do a lot of the backing up manually which to me seems prone to human error so maybe it is good to check if an automatic backup can be set up.

dmuis commented 1 year ago

@EstherPlomp I don't fully understand the assignments regarding the 'folder structure' and README file. Is your answer in the discussion forum not the same as your README file? Also I didn't really understand what the cookiecutter does. I did clone the template from Github and got a nice folder overview this way, but what does cookiecutter have to do with this folder structure README file? And finally what is meant with responding to this issue with your readme file, are we expected to share a surfspot link again?

S-Aron commented 1 year ago

Hi @dmuis, partly an answer at least about the README file. You can copy the text from here (https://cornell.app.box.com/v/ReadmeTemplate) to a .txt file and can fill in with the required data. After saving you can share here your README file. That's basically the task (in short).

EstherPlomp commented 1 year ago

No worries @dmuis: good that you're reaching out! and thanks @S-Aron for stepping in and explaining the READme assignment!

The folder structure is focussed more on your folder organisation itself - the readme file generally includes more contextual information about who is behind the folder structure, what methods you have used, and more detailed information. You can check the template that @S-Aron shared and answer the questions in there. There is some overlap where you can include your folder organisation in the readme file, as the readme file would be a good location to provide explanation about how you structured your data/folders and so forth. I hope this is starting to make more sense now?

You can share either your README file in this issue indeed, or you can copy/paste the text into a response: your READme should be in plain text, so it should be straightforward to just paste it here in a reply!

The cookiecutter provides you with a more automated way of setting up your folder structure, which can be especially helpful if you have to use a similar folder organisation multiple times (for example, when you have 5 students and you want each of them to use the same folder structure for their project). Next to that, the template that we're using for this is also following some recommended practises which you can find here: Good Enough Practices in Scientific Computing.

dmuis commented 1 year ago

This readme file was generated on [2023-05-31] by [Daniel Muis]

GENERAL INFORMATION

Title of Dataset: Phantom T2 VH ZZ termination 1.1 um

Author/Principal Investigator Information Name: Daniel Muis ORCID: N/A Institution: Kavli Institute of Nanoscience, TU Delft Address: Lorentzweg 1, 2600 GA Delft, The Netherlands Email: b.d.muis@tudelft.nl

Author/Associate or Co-investigator Information Name: Rene Barczyk ORCID: 0000-0002-0497-860X Institution: Center for Nanophotonics, AMOLF Address: Science Park 104, 1098 XG Amsterdam, The Netherlands Email: R.barczyk@amolf.nl

Author/Alternate Contact Information Name: Kobus Kuipers ORCID: 0000-0003-0556-8167 Institution: Kavli Institute of Nanoscience, TU Delft Address: Lorentzweg 1, 2600 GA Delft, The Netherlands Email: l.kuipers@tudelft.nl

Date of data collection: 2023-04-24

Geographic location of data collection: Delft, The Netherlands

Information about funding sources that supported the collection of the data:

SHARING/ACCESS INFORMATION

Licenses/restrictions placed on the data: CC-BY

Links to publications that cite or use the data: N/A

Links to other publicly accessible locations of the data: N/A

Links/relationships to ancillary data sets: N/A

Was data derived from another source? If yes, list source(s): N/A

Recommended citation for this dataset: N/A

DATA & FILE OVERVIEW

File List: M020_Phantom_F00001.cfg M020_Phantom_F00001.r00 M020_Phantom_F00001.r01 M020_Phantom_F00001.r02 M020_Phantom_F00001.r03 M020_Phantom_F00001.r04 M020_Phantom_F00001.r05 M020_Phantom_F00001.r06 M020_Phantom_F00001.r07 M020_Phantom_F00001.r08 M020_Phantom_F00001.r09 M020_Phantom_F00001.r10 M020_Phantom_F00001.txt

Relationship between files, if important: Every .r0x file contains data of the same measurement, but of a different device channel.

Additional related data collected that was not included in the current data package: N/A

Are there multiple versions of the dataset? If yes, name of file(s) that was updated: N/A Why was the file updated? When was the file updated?

METHODOLOGICAL INFORMATION

Description of methods used for collection/generation of data: PHANTOM set-up with self-built software from AMOLF.

Methods for processing the data: Lock-in amplifiers and a DAQ device.

Instrument- or software-specific information needed to interpret the data: Model SR830 DSP Lock-in Amplifier, Stanford Research Systems.

Standards and calibration information, if appropriate: see .cfg file

Environmental/experimental conditions: see .cfg file

Describe any quality-assurance procedures performed on the data: N/A

People involved with sample collection, processing, analysis and/or submission: Rene Barczyk for sample collection.

DATA-SPECIFIC INFORMATION FOR: [M020_Phantom_F00001.cfg]

Number of variables: Indicated in file

Number of cases/rows: N/A

Variable List: Indicated in file

Missing data codes: N/A

Specialized formats or other abbreviations used: N/A

dmuis commented 1 year ago

Additional note for the readme file above: It is way too much work for me to give data-specific information for each file. I also feel like it is unnecessary because that is what the .cfg and .txt file are for: To clarify exactly what is measured in all the .r0x files.

EstherPlomp commented 1 year ago

Hi @dmuis: If the information needed to interpret the files is already provides elsewhere I would also not repeat the information and just point to the .cfg and .txt files. In general, it is quite clear that people need to read the READme file first (hence the title!) - but where they have to go from there can be more ambiguous so it is helpful to explain which files make the most sense for people to look into.

You can also remove all the questions that are not relevant to you instead of answering them with N/A, since it is cluttering the READme. I do think you're using quality assurance procedures though - if you're using any standards/calibration settings before measuring, it is a form of quality assurance. So there you can again point to the .cfg file.

Lastly, you can also set up your own ORCID - this will be more helpful once you start to produce more research outputs. ORCID provides a nice space to collect all of these, and it is fully under your own control (instead of the profile that the university maintains).

Thanks also for submitting Assignment 3! It looks very clear, well done! I therefore only have one thought:

Metadata: Even if there is no standard it can be helpful to have an overview of all the metadata you would need to interpret the data. I guess this is already listed in this .cfg file?

dmuis commented 1 year ago

@EstherPlomp Thanks for you feedback. Regarding your thought on metadata: There are some abbreviations in the .cfg file so this could use extra clarification. This could either be implemented in the .cfg file itself or I could make a new metadata file that gives an overview.

Edit: Thanks for the ORCID link, I will definitely get one soon.

bauerjana commented 1 year ago

Hi @dmuis ,

Thanks for sharing your clear readme file. I specifically like the additional information regarding the data from different device channels stored in individual .r0x files. This facilitates the accessibility and interpretation of your data set for external people. I agree with @EstherPlomp that the removal of all questions answered with N/A would make your file even more straightforward.

I do have a rather general question towards the different indications of authorship at the top of the file. In my understanding, the Principal Investigator is the main responsible researcher = holder of the funding -> in your case that would be Kobus Kuipers, I assume. Hence I wondered about the indications given in your file. Maybe @EstherPlomp can help in this case?

bauerjana commented 1 year ago

And, additionally, here comes my feedback for assignment 3 as well:

I really like that you have the automatic generation of the relevant measurement folders as well as the metadata files with your measurement routines and also mostly open source data formats (or easy conversion possibilities). A lot of good practices already seem to be implemented in your work. You also have plans on how to share the data with external collaborators. So I don't really have any more feedback.

ArjanMejas commented 1 year ago

Dear @dmuis,

Assignment 3: I fully agree with your answers to the given topics, it is always good to keep things simple and concise.

Readme.txt: A simple and effective description of your dataset. Judging by it, it is very clear that you have control over your data storage and your work.

dmuis commented 1 year ago

@EstherPlomp and others if interested, here is my slide for class 2 with my RDM challenges and improvements: https://surfdrive.surf.nl/files/index.php/s/qlomFqdE7Naoj6h

dmuis commented 1 year ago

Hi @bauerjana , thanks for your feedback! That is actually a very good point, this makes me wonder what the role of the PhD is then? Is it the associate? @EstherPlomp maybe you know this.

EstherPlomp commented 1 year ago

(Edited because I misread the question :))

Hi @bauerjana, @dmuis: Good questions regarding the role of the PhD as a contact person or main author/contributor on a dataset!

In general this is comparable to how you would publish an article. Even if your PI is main responsible, and probably indicated as last author, you will probably be the corresponding/first author. That means that you do the article/dataset submission and handle any correspondence between submitting and publishing. That also means that your contact details are generally in these files. When you switch institutions this can be less helpful in comparison to the contact details of your PI. That's why they ask for the contact details of multiple contributors in the READme file. Edit: I would have yourself as lead/corresponding investigator listed first, and then more clearly indicate that your PI is the Principle Investigator of the lab, and listed at the end. I see associate/co-investigator more as another co-author/collaborator.

I hope that answered the question - if not, please let me know!

bauerjana commented 1 year ago

@EstherPlomp thanks for clarifying in this case! This way of listing the authors was indeed my expectation. So @dmuis, I would change your name and Kobus Kuipers in your file.

mausi122 commented 1 year ago

@dmuis Some late (I was on a conference) feedback on your readme.

To me not knowing what you measure the readme file doesn't tell me much about what the data means. Now I agree that doing this explaining in each readme file is tedious and somewhat overkill but you could for example link to an article where the measurement is explained. Also you can probably generate a lot of the Readme automatically, for example copy pasting author information from some database of users.

EstherPlomp commented 1 year ago

@EstherPlomp thanks for clarifying in this case! This way of listing the authors was indeed my expectation. So @dmuis, I would change your name and Kobus Kuipers in your file.

I think in this case the order is fine, but perhaps delete the part that says 'principle investigator' from Daniel's section and move that bit to Kobus instead of 'alternative point of contact'. In a sense you are your own project's principle investigator but because we work with this term in a different context (as in, your supervisor/lab leader is the Principle Investigator), it is confusing!

EstherPlomp / TNW-RDM-101