EstherPlomp / TNW-RDM-101

Self paced materials of the RDM101 course
https://estherplomp.github.io/TNW-RDM-101/
Creative Commons Attribution 4.0 International
6 stars 2 forks source link

Assignment 1 Federico Ramirez #79

Closed federa7 closed 11 months ago

federa7 commented 1 year ago

Introduction

Hello! My name is Federico Ramirez and I am a new PhD student at the Bionanosciences department of the Applied Sciences faculty, I come from Mexico and have been in the Netherlands for three years already.

Describe your research in 2-3 sentences to someone that is not from your field (please avoid abbreviations)

I am working in the field of synthetic biology, particularly on the project of developing a synthetic cell using a bottom-up approach. Basically, I take purified, non-living biological components, such as DNA, proteins and lipids, and put them back together to from cell-like compartments capable of mimicking cell-like activities.

My research entails the following aspects:

Research Aspect Answer
Use/collect personal data (health data, interviews, surveys) No
Use/collect experimental data (lab experiments, measurements with instruments) Yes
Collaborate with industry No
Write/develop software as the main output of the project No
Use code (as in programming) for data analysis Yes
Work with large data (images, simulation models) Yes
Other: N/A

Reflections on the importance of RDM videos

What I get the most out of the videos is the importance of DM for the transparency and ethics in science. Also, the importance for reproducibility in general, together with overall science quality. I did my Master thesis in this same lab and I already can see what I could have done better with managing data by seeing how my old-supervisor (now co-worker) has to ask me on how to interpret, or where to find, information. I think of what it would be like for her if I was not here and it seems scary.

What would you like to learn during this course?

I look forward to learn good practices and recommendations on how to standardize data storing and management. I hope to also find introductory steps and information of how DM relates/bridges to open science.

Checklist assignments

EstherPlomp commented 1 year ago

Copied from #95:

Hello everyone,

I share with you the link to my assignment 2. I hope it is clear for everyone. In my case, I couldn't think of particular flags for the type of data I will be working on.

Let me know if you identify any point of improvement or if you have any comment.

Best, Federico

EstherPlomp commented 1 year ago

Thanks for sharing assignment 2 @federa7! It look very clear and comprehensive, so I have little feedback! Well done!

nnadalalemany commented 1 year ago

Hi Fede, very nice DMP! I really liked how you carefully entered all information associated to each dataset. As a suggestion, maybe you could order the actions associated to each dataset in chronological order if possible, or even number them! That would make it easier to follow your workflow. Cheers, Natalia

EstherPlomp commented 1 year ago

Thanks for sharing assignment 3 @federa7! It looks very clear and extensive! Well done especially on the data publication part where you have looked into the various options available to share the data!

Metadata

file format

Data publication

federa7 commented 1 year ago

Thanks for sharing assignment 2 @federa7! It look very clear and comprehensive, so I have little feedback! Well done!

  • With 'A significant amount of my data remains stored in the source equipment for one month.' do you mean that after you transfer the data it will still be there on the source equipment for a month? Or do you only transfer the data after a month? I hope the former? I suppose it will have to be removed because of the size involved?

Hi Esther, A little later on but I would still like to respond to your comments! Indeed, the former: the data is so large that it has to be erased frequently from the equipment's memory. I transfer the data to my U: drive right after I finished my experiment. In some particular cases (microscopy) I've noticed that the data gets corrupted during the transfer (I usually might realize immediately or the day after, when starting to analyse the data) so it is good that I have some days to still retrieve it properly.

EstherPlomp commented 1 year ago

Hi Esther, A little later on but I would still like to respond to your comments! Indeed, the former: the data is so large that it has to be erased frequently from the equipment's memory. I transfer the data to my U: drive right after I finished my experiment. In some particular cases (microscopy) I've noticed that the data gets corrupted during the transfer (I usually might realize immediately or the day after, when starting to analyse the data) so it is good that I have some days to still retrieve it properly.

No worries - thanks for still following up! Indeed - that gives you some room when unfortunate situations like data corruption happens. I hope that doesn't happen too often, it is very annoying to have to retrace your steps like that!

federa7 commented 1 year ago

Thanks for sharing assignment 3 @federa7! It looks very clear and extensive! Well done especially on the data publication part where you have looked into the various options available to share the data!

Metadata

file format

  • I'm not entirely sure whether .FCS and .sky file formats are open: it looks like you do need a specific software to fully interact with the files so they might be proprietary. As long as that is the main file format in use for these type of files/analysis that is also not a problem. .RIF does appear to be proprietary, so .TIF would be the open alternative.

Data publication

  • Flowrepository, LIPID MAPS, and PRIDE look like great solutions, if they are indeed compatible with your data
  • I guess you would make use of the European Nucleotide archive as part of the INSDC?
  • Cytobank looks more like a solution for active data management, not necessarily publication. They do state the following: "Cytobank fills a key NIH mandate for making published data and results available to the scientific community, and has a Reports system for hosting an interactive analysis to accompany a journal publication." I find it a bit difficult to assess this since there is no browsing of these reports.
  • Do note that supplementary materials are not following the FAIR principles - so pending on your project's funding requirements this may not be sufficient. This is because these supplementary materials do not have their own identifiers/DOIs. You can always make use of 4TU.ResearchData for these types of large files, since this repository allows you to upload 1 TB per year for free.
  • Microscopy repositories that I'm aware of are:

Hi Esther, Metadata I was indirectly aware of this standard mostly by exposition to them (SBOL ones), since they are very well established in the field. It is good to now be aware of their full extent and application.

file format You are right about .sky format. It is proprietary and requires of a specific software (although this software is free access). The right open format that is fully and directly interchangeable is mz formats (.mzXML, .mzML, etc.). Regarding .FACS files, I would insist it is the standard used for the technology to the extend that differents equipments from different brands generate their data with this format and it is readable by different propietary and free access softwares. There are several open source python libraries that allow the analysis of the data. I believe this data can be converted to .CVS format, but I'm not entirely sure if would really represent an advantage in accessibility given that this format is already highly compatible.

Data publication

Thank you for your comments! Federico