EstherPlomp / TNW-RDM-101

Self paced materials of the RDM101 course
https://estherplomp.github.io/TNW-RDM-101/
Creative Commons Attribution 4.0 International
6 stars 2 forks source link

Assignment 1 [Alessandro Longhi] #81

Closed Aleartulon closed 9 months ago

Aleartulon commented 1 year ago

Introduction

Describe your research in 2-3 sentences to someone that is not from your field (please avoid abbreviations)

My research entails the following aspects:

Research Aspect Answer
Use/collect personal data (health data, interviews, surveys) No
Use/collect experimental data (lab experiments, measurements with instruments) /No
Collaborate with industry Yes
Write/develop software as the main output of the project Yes
Use code (as in programming) for data analysis Yes
Work with large data (images, simulation models) Yes
Other: N/A

Reflections on the importance of RDM videos

<- Data management is important especially because it is something that is generally not taught, and young researcher could think it is something not useful, until something bad happens. Nothing bad has happened to me, but in my lab by mistake a laptop was unplugged by the charger and all data were lost.->

What would you like to learn during this course?

Checklist assignments

EstherPlomp commented 1 year ago

Hi @aleartulon! Thanks for submitting your first assignment! Can you try to edit your 'issue' using the three dots in the right corner of it? Then if you remove all the < !-- in the text, it will also show up in the preview/submitted version. Now it is hidden.

I hope that makes sense - let me know if you need any help with that.

EstherPlomp commented 1 year ago

Copied from #98:

Hi, this is the link to my template

https://surfdrive.surf.nl/files/index.php/s/rVvXKdWvIIF8sG5

giacomolastrucci commented 1 year ago

Hi Alessandro,

looks good! The only question that I have, given also the common framework in which we will be working, is: Do you already have such training data from numerical simulations or you have to actually run the simulations to get data? Can you consequently have multiple datasets (e.g., different operative ranges, etc.) or you just have to store one dataset?

Aleartulon commented 1 year ago

Hi Alessandro,

looks good! The only question that I have, given also the common framework in which we will be working, is: Do you already have such training data from numerical simulations or you have to actually run the simulations to get data? Can you consequently have multiple datasets (e.g., different operative ranges, etc.) or you just have to store one dataset?

Hi Giacomo, thank you!

In principle other partners inside my PhD projects are going to provide the data, but I also have the code to run the simulations myself. I think at the beginning I will use data from other people but at some point it will be easier for me to decide which data obtain in order to improve the ML predictions. So yes, it is very likely I will have to store multiple data-set, or at least I will have to access different data set from clusters of other research groups.

EstherPlomp commented 1 year ago

Thanks for sharing your assignment @Aleartulon, well done!

I think a couple of things could be further clarified:

And, as you're working with commercial data, GitLab could be a more secure storage solution compared to GitHub. For the TU Delft GitLab instance everything is managed by TU Delft instead of GitHub/Microsoft that could use the code for their own purposes.

Hope that helps!

Aleartulon commented 1 year ago

Hello, please find here my 3 assignment! Let me know if you have comments/suggestions.

https://surfdrive.surf.nl/files/index.php/s/gLVQzDnsxmvy954

Aleartulon commented 1 year ago

Thanks for sharing your assignment @Aleartulon, well done!

I think a couple of things could be further clarified:

  • With webdrive, do you mean the network drives of TU Delft? So the Group/Bulk and Project drives?
  • I don't see any file format and sizes listed: particularly the file formats will be important for assignment 3!

And, as you're working with commercial data, GitLab could be a more secure storage solution compared to GitHub. For the TU Delft GitLab instance everything is managed by TU Delft instead of GitHub/Microsoft that could use the code for their own purposes.

Hope that helps!

Hi, thank you for the comments.

EstherPlomp commented 1 year ago

Thanks for sharing Assignment 3 @Aleartulon! And for responding to the feedback! No worries if you're not sure about the sizes yet, estimations are more than sufficient! And well done on the rest of the assignment - it looks clear!

Some pointers from my side:

Metadata

File Formats

Access

Data Publication

giacomolastrucci commented 12 months ago

Hi Alessandro, well done, a very good overview of your project data in Assignment 3. I think you have already got the most important feedback from Ester. I can just suggest you another tool (besides Tensorboard) to keep track of your training statistics, models and metadata. It is called Weights&Biases: it is very convenient, easy to implement, cloud-based and very comprehensive. Check it out!

EstherPlomp commented 11 months ago

That looks interesting @giacomolastrucci! thanks for sharing!