Assignment 1 [Ji Li] - Githubissues

JiLi24 commented 9 months ago

Introduction

My name is Ji Li. I am from the Faculty of Applied Science. I am from China and now have been studying here for one and half years. I love cooking and cycling.

Describe your research in 2-3 sentences to someone that is not from your field (please avoid abbreviations)

My ongoing research focuses on wastewater treatment processes, a crucial aspect of environmental protection. The key strengths in my work include the design and innovation of treatment technologies aimed at energy saving and recovery. Additionally, my career ethos prioritizes delivering added benefits to stakeholders and making substantial contributions to public environmental welfare.

My research entails the following aspects:

Research Aspect	Answer
Use/collect personal data (health data, interviews, surveys)	No
Use/collect experimental data (lab experiments, measurements with instruments)	Yes
Collaborate with industry	Yes
Write/develop software as the main output of the project	No
Use code (as in programming) for data analysis	No
Work with large data (images, simulation models)	No
Other:	N/A

Reflections on the importance of RDM videos

Many unexpected events can lead to the loss or destruction of our experimental data and documents. Losing data means having to recreate or even restart experiments, which wastes a lot of time and energy. After the long-term operation of my reactors, I often accumulate a significant amount of data. If I don't manage it well, I become confused, make mistakes, and sometimes even mess it up. I once had a terrible experience when my laptop crashed while I was working on my paper and analyzing data.

What would you like to learn during this course?

Some tools that could help me to manage my data. Some experience on how to handle and share a large amount of data. Keep in mind that the improtance of data management.

Checklist assignments

[x] Assignment 1: creating a GitHub issue (the overview issue) (before 15 February, 10:00)
[x] Respond to the GitHub issue #101
[x] Assignment 2: Data Flow Map 1 (share a link in this issue before 22 February 13:00). Link: https://surfdrive.surf.nl/files/index.php/apps/richdocuments/public?fileId=%7Bfile_id%7D&shareToken=XSgnw5xV7Cx77ol
[x] Provide feedback to at least one Assignment 2 from another participant
[x] Respond to the GitHub discussion #103
[x] Respond to the GitHub discussion #102
[x] Assignment 3: Data Flow Map 2 (share a link in this issue before 5 March 13:00). Link:https://surfdrive.surf.nl/files/index.php/s/RNxAj5TRDjFYG9U
[x] Provide feedback to at least one Assignment 3 from another participant
[x] Assignment 4: Data Management Plan (before 8 March, 13:00)
[x] Respond to the GitHub discussion #104 if you have any questions (optional)
[x] Assignment 5: Data Flow Map 3: submit your slide (before 11 March, 17:00)

SiemEerden commented 9 months ago

Hey Ji! I was reading your data flow map, and I was wondering a few things:

Why do you use a chat app for data storage? These are typically not designed for that, so maybe that is not very safe.
Why did you buy a drive of your own for data storage if the TU actually provides multiple options? If I read up about the company Baidu, I see there have been multiple reports of Baidu apps secretly gathering data, such as sensitive user data. It's a bit like why we shouldn't use google drive either, but with an extra layer: Being based in China, it is subject to state censorship by the Chinese government, meaning that in principle the Chinese government has access to TU Delft research data. I would doublecheck whether this is allowed, or desirable.

https://www.zdnet.com/article/baidus-android-apps-caught-collecting-sensitive-user-details/ https://www.reuters.com/article/idUSKCN0VX0EW/

EstherPlomp commented 9 months ago

Thanks @18810704625 for sharing your assignment 2! It looks very clear!

I have the same comments/suggestions as @SiemEerden - thank you for the feedback and questions!

In addition: are you using a paper or electronic lab note book? You could also consider this information as a data flow/set so that you can reflect upon how to store/manage this information for Assignment 3.

EstherPlomp commented 8 months ago

Thanks @JiLi24 for sharing the assignment!

Some suggestions:

To shorten your file names you can also take out the spaces. I don't think you're working with code/analysis scripts - otherwise that becomes even more important! Although you do mention GitHub later in the assignment - so it is not entirely clear to me if you're working with software or not!
What documentation ensures that you're always able to find these samples you'll need to store for the long term? Even with labels it can be a pain to find things back in the lab in my experience!
Regarding metadata: what are these parameters/conditions/steps?
While word/excel files are now interoperable, they're not necessarily open as you'll need a microsoft license to make use of all of the functionalities. Open alternatives are .txt or .csv files.
Uploading data in supplementary materials is not following the FAIR principles since supplementary materials do not have their own DOI associated with it so links might break in the long term, and if you publish behind a paywall the supplementary materials may become even more inaccessible. This is why also supplementary materials can be shared on a data repository such as 4TU.ResearchData! If needed, 4TU.ResearchData can also be used for restricted access datasets, where you/supervisors would need to approve access requests before people can reuse the data. This is only for confidential data as most data can be made publicly accessible!
While GitHub indeed allows for the assignment of a license, it is not a long term preservation platform and it does not assign DOIs to the code - Zenodo has a nice integration with GitHub that shares a snapshot of the repository on Zenodo and assigns this DOI that you then can link back to your GitHub Repo. You can also share your code via 4TU.ResearchData/GitHub, but the process is a bit more manual right now.
CC-BY is a license used for data but generally not for code: please check out the TU Delft approved software licenses: MIT, BSD, Apache, CC0, GPL, AGPL, LGPL, EUPL.

JiLi24 commented 8 months ago

Thanks for your very nice feedback@EstherPlomp. here are the answers and discussion to your comments/questions: • To shorten your file names you can also take out the spaces. I don't think you're working with code/analysis scripts - otherwise, that becomes even more important! Although you do mention GitHub later in the assignment - so it is not entirely clear to me if you're working with software or not! A1: Good idea. I also think the name is a little long sometimes. However, after long-term storage, the short name will confuse me. The special explanation documentation or the file captions are needed to list all the detailed information (data types, collection data, purpose, reactor condition, samples lists, etc) about these documents. Yeah, I don’t need to deal with codes or scripts. All my data are from the experiments and lab measurements. • What documentation ensures that you're always able to find these samples you'll need to store for the long term? Even with labels it can be a pain to find things back in the lab in my experience! A2: I totally agree with you. That is also a challenge for me to quickly find and clarify my samples after a long storage time. Instead of managing data, the abilities of how to label, store and manage lab materials/stuff are also important and meaningful. Our technicians have already trained us though, we still need to exchange and study the cases from other experienced people.
• Regarding metadata: what are these parameters/conditions/steps? A3: Sorry for making you confused. I just regard our basic and standard parameters and procedures in the measurements and analysis as the “special metadata”, as these even should not be changed for reproducing the measurements. These data are always from the standard book or manual. • While word/excel files are now interoperable, they're not necessarily open as you'll need a microsoft license to make use of all of the functionalities. Open alternatives are .txt or .csv files. A4: Thank you for your advice. I got the difference between them. .txt or .csv files will be my preference for these types of data documents. • Uploading data in supplementary materials is not following the FAIR principles since supplementary materials do not have their own DOI associated with it so links might break in the long term, and if you publish behind a paywall the supplementary materials may become even more inaccessible. This is why also supplementary materials can be shared on a data repository such as 4TU.ResearchData! If needed, 4TU.ResearchData can also be used for restricted access datasets, where you/supervisors would need to approve access requests before people can reuse the data. This is only for confidential data as most data can be made publicly accessible! A5: very good discussion for the data sharing. Most of our important data will be described in the publication text. Besides, there are some explanations and supplementary data to give more detailed procedures and parameters for our main results. As you mentioned, it has obvious weaknesses in the long-term storage and also the paywall. 4TU.ResearchData would be the best choice for me to keep these datasets, which help us manage the data and approve the request easily. • While GitHub indeed allows for the assignment of a license, it is not a long term preservation platform and it does not assign DOIs to the code - Zenodo has a nice integration with GitHub that shares a snapshot of the repository on Zenodo and assigns this DOI that you then can link back to your GitHub Repo. You can also share your code via 4TU.ResearchData/GitHub, but the process is a bit more manual right now. A6: Thanks for your recommendations. I will try to use 4TU.ResearchData and also the integration of Zenodo with GitHub for my sharing instead of only GitHub. • CC-BY is a license used for data but generally not for code: please check out the TU Delft approved software licenses: MIT, BSD, Apache, CC0, GPL, AGPL, LGPL, EUPL. A7: Thanks. I will check it later. I am not familiar with these different types of licenses.

EstherPlomp commented 8 months ago

Glad to hear the feedback is helpful! And thanks for clarifying and answering my questions!

If you're not producing analysis scripts/software that you will share, you can disregard the licenses for software (the last comment)!

Adarsh-Shajimon commented 8 months ago

Hi Ji, thanks for sharing the assignment.

As Esther said, .txt or .csv files are open file formats and I would suggest keeping a .pdf version of the protocols as well. The other files are easy to edit, hence changes that happen by mistake could lead to inaccuracies in the protocol. It would be difficult to spot minor errors also. Therefore, having a copy of the protocols in pdf can be helpful.

EstherPlomp / TNW-RDM-101

Assignment 1 [Ji Li] #123