Assignment 1 [Mohammad Safari]

msafariii commented 1 year ago

Introduction

Hello, everyone! I'm Mohammad, a Ph.D. candidate in the Department of Geoscience, specializing in the field of Applied Geophysics

Describe your research in 2-3 sentences to someone that is not from your field (please avoid abbreviations)

My research is dedicated to the application of AI in seismic imaging. It involves the analysis of seismic imaging algorithms to harness advanced high-performance computing techniques and machine learning methods. The aim is to achieve higher-resolution images and faster algorithms

My research entails the following aspects:

Research Aspect	Answer
Use/collect personal data (health data, interviews, surveys)	No
Use/collect experimental data (lab experiments, measurements with instruments)	Yes
Collaborate with industry	Yes
Write/develop software as the main output of the project	No
Use code (as in programming) for data analysis	Yes
Work with large data (images, simulation models)	Yes
Other:	N/A

Reflections on the importance of RDM videos

The videos is all about the significance of data management for transparency, ethics, and reproducibility in science. It would be a little hard initially, but setting up proper organization ultimately yields long-term benefits. During my geophysics studies, I faced challenges in accessing existing scientific code, which hindered my work. If everyone used GitHub, it would have made my tasks much easier. Additionally, organizing my code and maintaining version history became essential, especially when code changes went awry. Having a structured system with a change history allowed me to revert to a functional backup when needed.

What would you like to learn during this course?

While working, we occasionally receive sensitive data from oil companies that they consider highly confidential, prohibiting us from sharing it elsewhere. I would appreciate guidance on how to handle such situations

Checklist assignments

[x] Assignment 1: creating a GitHub issue (before 12 September, 10:00)
[x] Respond to the GitHub issue #65
[x] Assignment 2: Data Flow Map 1 (share a link in this issue before 20 September 13:00). Link: https://tud365-my.sharepoint.com/:p:/r/personal/mohammadsafari_tudelft_nl/Documents/RDM101_Assignment1_Week1_DataFlowMap_Mohammad_Safari.pptx?d=w96cc7529a43345e4846fbdd0b304cfd9&csf=1&web=1&e=CUdQxx
[x] Provide feedback to at least one Assignment 2 from another participant
[x] Respond to the GitHub discussion #66
[x] Respond to the GitHub discussion #67
[x] Assignment 3: Data Flow Map 2 (share a link in this issue before 4 October 13:00). Link: https://tud365-my.sharepoint.com/:p:/g/personal/mohammadsafari_tudelft_nl/ESl3SqVUmE9JjFtpyhJbxPYBMT1lTzNqox_pljmrsp_2Mg?e=dvVq0D
[x] Provide feedback to at least one Assignment 3 from another participant
[x] Assignment 4: Data Management Plan (before 11 October, 13:00)
[x] Respond to the GitHub discussion #68 if you have any questions (optional)
[x] Assignment 5: Data Flow Map 3: submit your slide (before 17 October, 17:00)

EstherPlomp commented 1 year ago

Thanks for sharing assignment 2 @msafariii! I really like how you visualised the workflow - well done!

I just have two points as feedback:

will the modified JMI code also be confidential if it is building on from older code that is confidential? And will that have a potential effect on the Marmousi synthetic model?
Is there any documentation or metadata that could be its own separate dataset? Or is that included in the current datasets?

ieacdroste commented 1 year ago

Hi @msafariii, your data flow map looks very clear! What is the size and file format of the data that is produced by your code?

msafariii commented 1 year ago

Thanks for sharing assignment 2 @msafariii! I really like how you visualised the workflow - well done!

I just have two points as feedback:

will the modified JMI code also be confidential if it is building on from older code that is confidential? And will that have a potential effect on the Marmousi synthetic model?

Is there any documentation or metadata that could be its own separate dataset? Or is that included in the current datasets?

Thanks for your comment. Regarding your second feedback:

Yes, the modified JMI code is building upon a confidential older code. In that case, it too should be considered confidential to ensure that the proprietary elements of the original code are protected.

Regarding your second feedback : Marmousi Model Metadata: The Marmousi model, when originally distributed, usually comes with some documentation or metadata that provides details about the model, such as its origin, the physics it incorporates, its dimensions, sampling intervals, etc. This information ensures that users of the model can interpret it correctly and understand its limitations.

JMI Outputs and Metadata: When you apply JMI or any other inversion method to the Marmousi model (or any other dataset), the results will typically include not just the inverted subsurface properties (like velocity or impedance models) but also some metadata. This metadata could include parameters used in the inversion, the algorithm's settings, computational resources used, iteration counts, convergence metrics, etc.

msafariii commented 1 year ago

Hi @msafariii, your data flow map looks very clear! What is the size and file format of the data that is produced by your code?

Hi @ieacdroste. thanks for your comment. The input data can vary from gigabytes to terabytes in size. While the processing steps also involve handling data from gigabytes to terabytes, the final output typically spans from megabytes to gigabytes in my code.

EstherPlomp commented 1 year ago

Yes, the modified JMI code is building upon a confidential older code. In that case, it too should be considered confidential to ensure that the proprietary elements of the original code are protected.

💯 Sounds like that it indeed the case!

Thanks also for your further explanations, @msafariii, and for responding to @ieacdroste's feedback! (thanks also for the feedback!). I'll leave it up to you if you want to still update your data flow map - the assignment deadlines have now passed so that may be a bit redundant :)

EstherPlomp commented 1 year ago

Thanks for sharing assignment 3 @msafariii! It looks great again, well done!

Just some minor thoughts from my side:

File formats

Especially if some of the files are confidential it may be redundant to have them in open formats. For data/code that you can share it makes more sense!

Data publication

Especially when sharing code and using GitHub it might make more sense to share things under the MIT license (for which you indicated your preference in #66)

ieacdroste commented 1 year ago

Hi @msafariii, your assignment 3 looks clear and detailed. I was wondering where save the raw data. You mentioned it can be up to terabytes in size. Do you store it on OneDrive and does that have a storage limit?

msafariii commented 1 year ago

Thanks for sharing assignment 3 @msafariii! It looks great again, well done!

Just some minor thoughts from my side:

File formats

Especially if some of the files are confidential it may be redundant to have them in open formats. For data/code that you can share it makes more sense!

Data publication

Especially when sharing code and using GitHub it might make more sense to share things under the MIT license (for which you indicated your preference in Licenses #66)

Thank you for your comment. Regarding your feedback, I appreciate your insights:

File Formats: You make a valid point about file formats. It's essential to consider the sensitivity of the data and whether open formats are suitable. I received the JMI code in py format which is an open file format but we may need to change the format during our work.

Thank you for reminding us about the MIT license. It's crucial for maintaining clarity and consistency in our code-sharing efforts.

msafariii commented 1 year ago

Hi @msafariii, your assignment 3 looks clear and detailed. I was wondering where save the raw data. You mentioned it can be up to terabytes in size. Do you store it on OneDrive and does that have a storage limit?

Hi @ieacdroste, Thanks for your comment. You're right; sometimes the data can be really huge, even more than a terabyte. But honestly, I haven't reached the point where I'm dealing with real data that size. Sometimes, we use complex synthetic data from big companies, which is smaller and more manageable. If I needed to use that data size, I wouldn't use OneDrive for my central storage because of the storage limit.

EstherPlomp / TNW-RDM-101