EstherPlomp / TNW-RDM-101

Self paced materials of the RDM101 course
https://estherplomp.github.io/TNW-RDM-101/
Creative Commons Attribution 4.0 International
6 stars 2 forks source link

Assignment 1 Ran Huo #64

Closed r-huo closed 1 year ago

r-huo commented 1 year ago

Introduction

Hi! My name is Ran and I'm second-year PhD student at the Bionanoscience department.

Describe your research in 2-3 sentences to someone that is not from your field (please avoid abbreviations)

I research on novel optical microscopic techniques for biological applications, in other words, I take fancy images of cells at nanometer scale to see what's happening inside/between them :)

My research entails the following aspects:

Research Aspect Answer
Use/collect personal data (health data, interviews, surveys) No
Use/collect experimental data (lab experiments, measurements with instruments) Yes
Collaborate with industry No
Write/develop software as the main output of the project No
Use code (as in programming) for data analysis Yes
Work with large data (images, simulation models) Yes
Other: N/A

Reflections on the importance of RDM videos

What I learned from the video is that RDM is not only about safely backing up and organizing your data, but also making data reproducible and even open, which should be kept in mind from the moment when acquiring the data. As for myself, I've always had the nightmare every time I need to present my experiment results at a meeting or conference when I have to dig out the actual useful data from numerous folders on multiple devices.

What would you like to learn during this course?

As an experimentalist generating large volume of data (images) on a daily basis, often from different locations, I would like to learn from my peers how they organize their raw data/meta data/analysis scripts in a simple and effective way.

Checklist assignments

EstherPlomp commented 1 year ago

Thanks for sharing @r-huo! Your assignment 2 and 3 look clear- well done!

Some pointers: Assignment 1:

Assignment 2: File formats

I hope this helps!

Miytek commented 1 year ago

Hi Ran, I have checked your Data Flow , nice structure :) I also noticed that you are planning to backup at home. Another alternative would be using project data folder. Either the one that we applied as a group or you can apply for your own project :).

Miytek commented 1 year ago

Hi again, Another very nice structure :) One comment: the naming would be problematic as I also did a similar naming and now when I try to copy files to my project folder, some of them were rejected as the path is too long. FYI :)

r-huo commented 1 year ago

I wrote a sample readme.txt based on the Cornell template, for a dataset I intended to publish of an open microscopy hardware project. I paid attention to whether the files are in open formats, and included the information needed to view the data. https://surfdrive.surf.nl/files/index.php/s/pAGkRILswEvcBwC

v2-https://surfdrive.surf.nl/files/index.php/s/L6LUFr8Ql5ZiNP0

Miytek commented 1 year ago

Hi Ran! Nice readme file :) Mine is also similar to yours except I preferred just deleting that parts I dont have anything to say (such as related citation in your case) to make it easy to read.

EstherPlomp commented 1 year ago

Hi @r-huo! I think you may have updated the readme file, so the SURF link broke. Can you please reshare a link to your README?

Please also don't forget to provide other people with feedback on their assignments and to complete assignment 5!

r-huo commented 1 year ago

Thanks for sharing @r-huo! Your assignment 2 and 3 look clear- well done!

Some pointers: Assignment 1:

  • In principle, using the umbrella drive is already a sufficient backup. But it can never hurt to have another backup on a harddrive!
  • Any data stored via eLABjournal is automatically backed up via eLABnext - so that is also taken care off. If all else fails there is indeed paper :)
  • Great screenshot of your folder structure! Don't forget to also add this to the folder structure thread

Assignment 2: File formats

  • Are the CAD files also in an open format? We had some discussion about this during the course - they can be converted to .svg and .pdf files but that may hamper the interoperability of these file types and may result in loss of information.
  • .TIFF and .txt files are also open, and any files in eLABjournal should also be convertible to an open file format. Access/Publication
  • From what I understand from your assignment you can't share all the data? Not sharing all the raw data is fine, but the data underlying publications/thesis chapters should be shared according to the FAIR principles according to the TU Delft data policies. This also means you'll need a data repository to share the data/code. In your data management plan you mentioned 4TU.ResearchData. While GitHub is a great platform to manage and share code, it doesn't have a longterm preservation guarantee and it also doesn't assign DOIs to the repositories. For this you would need the data repository. GitHub has an integration with Zenodo, and 4TU.ResearchData has an integration with Git.
  • Regarding licenses, in your DMP you indicated: Data is licensed under CC BY-NC-SA, and softwares are under GPL 3.0+. If these preferences change after any discussions with your supervisor, you can also update your plan on DMPonline if needed. Please also add your preferred licenses to the license thread.

I hope this helps!

Hi Esther! Thank you a lot for the feedback! The data I produce will be open mostly, except when restricted by the vendors or collabrators. I'm consider using Zenodo and adopt DOI for accessibility. I'm not yet so sure about the formats of CAD files, since they are created usually in commercial softwares. There are some free online viewer for commercial CAD files I know of, which won't help much for adaption of the work though.

r-huo commented 1 year ago

Hi @r-huo! I think you may have updated the readme file, so the SURF link broke. Can you please reshare a link to your README?

Please also don't forget to provide other people with feedback on their assignments and to complete assignment 5!

Hi! The readme file link expired. Now it should work again:)

EstherPlomp commented 1 year ago

Hi @r-huo

Thanks for sharing your README!

r-huo commented 1 year ago

Hi @r-huo

Thanks for sharing your README!

  • Like @Miytek already mentioned: some of the questions that are not applicable to your dataset can be removed. For the recommended citation I would enter something, however: This could be the dataset itself (so then the title, authors, and perhaps adding Zenodo) OR refer to the accompanying publication. If you don't make it easy for people to cite the data/work they'll have to guess and they might not do what you would prefer.
  • I don't see you mentioning any ORCIDs: this is a persistent identifier for researchers and provides a space for you to collect research outputs. You can also use your ORCID to login to Zenodo and other platforms (publisher platforms for example). You can have a look at Kristin's ORCID to see what it looks like when you have more research outputs. You can get your own ORCID via the website.
  • For any additional data: if you say request via email, please also provide the preferred email. You have three emails listed in the contributors list, so it may help people to reach out if you put a preferred email there.
  • Well done on including your folder structure in the README!
  • Regarding quality assurance procedures: Are there any decisions you make to exclude data? When is something good enough to be included in the dataset?
  • For the experimental conditions your answer seems a bit general to an outside like myself: Is there anything more specific you can refer to? A publication where the method is explained in more detail? Or a method section of the accompanying paper?

Hi Esther! Thanks a lot for the feedback. I've updated my readme.txt readme v2. I added my newly registered ORCID, excluded datasets, and more experimental details, and tried to increase the readability. It's a great point to make it clear to people how to cite the data. I'll keep it in mind once the release is actually out there accompanying a publication. For now it's a good practice. I plan to exclude one dataset that is very specific to our lab practice (home-made control scripts for one type of laser we are using), which others can easily replace depending on the lasers they already use. I would also consider publishing it once it's tested for its reliability. Since it won't affect the transfers of other datasets, I think it's okay to leave it only available for email requests. Or would it be better to include the scripts with a disclaimer?

EstherPlomp commented 1 year ago

Hi @r-huo! Glad to hear that the feedback is helpful!

I think if it is very specific to your lab it might not make a lot of sense to share the scripts yet - unless this is something that other research groups are otherwise also having to figure out? On the other hand: If you're comfortable sharing these and it is not a lot of additional work, you can also share them and update them again if needed once these are tested for reliability etc. Zenodo allows for versioning, also via releases pushed via GitHub, so this should be workable.

It depends on what you're comfortable with, and also what your supervisor's preferences are! I hope that makes sense - if not, please let me know: I'm happy to have a chat!