EstherPlomp / TNW-RDM-101

Self paced materials of the RDM101 course
https://estherplomp.github.io/TNW-RDM-101/
Creative Commons Attribution 4.0 International
6 stars 2 forks source link

Assignment 1 Pietro Sillano #72

Closed pietro-sillano closed 8 months ago

pietro-sillano commented 1 year ago

Introduction

Hi all, my name is Pietro Sillano and I'm a PhD student at TNW/Bionanoscience department.

Describe your research in 2-3 sentences to someone that is not from your field (please avoid abbreviations)

My research focuses at the interface between biology, soft matter and computational physics. I’m especially interested in how biological and artificial cells can divide themselves.

My research entails the following aspects:

Research Aspect Answer
Use/collect personal data (health data, interviews, surveys) No
Use/collect experimental data (lab experiments, measurements with instruments) No
Collaborate with industry No
Write/develop software as the main output of the project Yes
Use code (as in programming) for data analysis Yes
Work with large data (images, simulation models) Yes
Other: N/A

Reflections on the importance of RDM videos

I will work with extensive simulation codes and will likely be required to create certain tools for data analysis. Then, maintaining up-to-date, precise, and accurate documentation for both the simulation process and parameters, as well as the analysis tools, becomes fundamental. I have one horror story from my master's thesis work: I lost half of the simulation data (several gigabytes) because I did have not a data management and/or data backup plan.

What would you like to learn during this course?

I know already some of the data management good practices but I struggle to adopt them as a habit/mindset. I also would like to improve in writing accurate and clear documentation for my software code.

Checklist assignments

EstherPlomp commented 11 months ago

Comment moved from #96:

Hello everyone,

I share with you my data flowmap: file Feel free to share any improvements or comments on it. Thank you! Best, Pietro

EstherPlomp commented 11 months ago

Thanks for sharing assignment 2 @pietro-sillano!

This already looks good! I'm just missing the file formats, which can partly be derived from what programming languages you're using, but for the purposes of assignment 3 it might be easier to just list them. I'm also wondering if the reuse of a previous collaborator's codebase might warrant a red flag: do you have all the information you need to work with this? Can you still contact this person if you run into any issues? How long would they be available to work together on this?

And a suggestion for your data storage: Perhaps it may be easier to use SURF or OneDrive as an active storage solution, which is also automatically backed up, and then back this up to the project drive at regular intervals? That might save you some time copying and transferring things. See the storage solution page for more information about all these storage solutions.

And great to see that you're already familiar with GitHub :)

FanJ-TUD commented 11 months ago

Hi Pietro, very detailed data flow map! I am just curious about how you managed to make good use of previous guy's database. Do you share the same structure or there is still a lot to modify?

pietro-sillano commented 11 months ago

@FanJ-TUD @EstherPlomp Thank you for your feedback!

@FanJ-TUD Maybe I did not explain correctly but what I mean with codebase is more related to the previous guy's code for a simulation software, so it is not actually a database. Anyway, your question is still valid and yes, I will maintain the main structure for the software at least at the beginning of my project.

@EstherPlomp Regarding the file format I did not specify them because they are many. Mainly they are .py python files, .lammpstrj MD trajectory files, .dat files and XYZ files. Anyway, they are quite standard formats, all text-based so it will be easy to manage, control and check them. Instead, the reuse of the collaborator code does not raise a red flag (at least for now): the code is well documented and he is still working on it for quite some time. About the storage I am not sure if I can use sufdrive/onedrive through the DelftBlue cluster that I use for running my simulations but I will check it out!

Thank you again!

EstherPlomp commented 11 months ago

Thanks for following up @pietro-sillano!

Fair enough that you have numerous file formats: as long as you have some examples of these for assignment 3 for the theme 'file formats' you should be all set.

And I'm very glad to hear that the collaborator is documenting the code well and is still working on it as well - then a red flag is less appropriate indeed!

Do let me know if you have any further questions about the storage options :)

EstherPlomp commented 11 months ago

Hey @pietro-sillano: I don't see a link or comment for assignment 3 - can you please still share your assignment or let me know if you need any help? Thanks!

pietro-sillano commented 11 months ago

Hey @pietro-sillano: I don't see a link or comment for assignment 3 - can you please still share your assignment or let me know if you need any help? Thanks!

Oh, I am sorry, I forgot to add the new link! I will fix it now

EstherPlomp commented 11 months ago

Oh, I am sorry, I forgot to add the new link! I will fix it now

Awesome, thank you!

EstherPlomp commented 11 months ago

And now with feedback for your assignment - Well done @pietro-sillano!

Data Organisation

Data documentation

File formats

Data publication

pietro-sillano commented 11 months ago

@EstherPlomp Thank you for your feedback!

At the moment, the name convention looks good to me too, I was thinking it can blow up when I add many parameter informations/annotations to differentiate the simulations.

I already heard about Sphinx but never used it, I will look into it!

I added my comment to License thread.

Thank you again