EstherPlomp / TNW-RDM-101

Self paced materials of the RDM101 course
https://estherplomp.github.io/TNW-RDM-101/
Creative Commons Attribution 4.0 International
6 stars 2 forks source link

Overview issue Max Theisen #124

Closed MaxThFe closed 7 months ago

MaxThFe commented 7 months ago

Introduction

Hi all, my name is Max Theisen. I am from Bonn in Germany and I am a first-year PhD student in Chemical Engineering.

Describe your research in 2-3 sentences to someone that is not from your field (please avoid abbreviations)

My research group focuses on the application of AI in Chemical Engineering. I specifically work on AI for process operations. For that, I simulate data or take data from literature and industry.

My research entails the following aspects:

Research Aspect Answer
Use/collect personal data (health data, interviews, surveys) No
Use/collect experimental data (lab experiments, measurements with instruments) No
Collaborate with industry Yes
Write/develop software as the main output of the project Yes
Use code (as in programming) for data analysis Yes
Work with large data (images, simulation models) Yes
Other: N/A

Reflections on the importance of RDM videos

Good data management is key in research because it helps us use and share findings easily. Bad code from colleagues or not being able to repeat others' experiments are big problems that I have experience myself. With better organization, we can all work together more smoothly. A good example for this is AI research, a bad example chemical engineering research.

What would you like to learn during this course?

I am always curious about better ways to make my code and data available. Especially managing large datasets is sometimes challenging.

Checklist assignments

mcesarcarvalho commented 7 months ago

Hi Max! Your data flow map is well-structured and clear. 😊 Good that you considered your file formats and sizes. I think we have similar projects in terms of our data (sensor data and trained ML models) and I was wondering about your backups, because OneDrive is limited to 1TB. The project U-Drive allows for up to 5TB. Have you considered the criteria for choosing which files to include in the backup?

EstherPlomp commented 7 months ago

Thanks for sharing assignment 2 @MaxThFe! It looks super nice and clear!

Good question @mcesarcarvalho - thanks for the feedback!

Some other suggestions/pointers:

ignasisg commented 7 months ago

Hi Max,

Your Data Flow Map 2 is very clear to me, even though I have no idea about your topic. Would be interesting to have a F2F chat regarding everything related to codes. I feel you know better how to save and manage those, and I am starting to implement more in my research.

mcesarcarvalho commented 7 months ago

Hi @MaxThFe! I think your assignment 3 is really well thought. It seems like you have a clear overview on how to document code (e.g., automatic code documentation), experiments and control versions. That is something I would like to discuss with you because I feel like you know a lot about it. As you are also collaborating with an industrial partner, I was wondering about the type of license that you will use. Do you think CC BY will bring problems regarding confidentiality? This topic is still a bit vague in my mind, because they are very strict with the content that is published.

EstherPlomp commented 7 months ago

Thanks for sharing assignment 3 @MaxThFe! It looks super comprehensive and clear! Well done!

I have very little feedback, only some notes on the use of GitHub below!

Thanks also for your feedback @mcesarcarvalho! I think the licenses are perhaps not the most stressing issue regarding confidentiality - the issue will probably be more focused on what you can and cannot publicly share. Once you share it publicly with a license, it is out there and perhaps some competitive advantage is lost - which is generally what the industrial partners care most about. So as long as you have the permission to share, I doubt they'll be on top of the license much! It is still important to discuss this with them of course :) Hope that helps a bit in making this less abstract!