Closed lizhenping closed 2 years ago
@zhenpingli, thank you for suggesting this improvement to the workflow for make_dataset()
! I currently do not have the capacity to integrate such changes into DIPS-Plus' main branch, however, if you would like, feel free to submit a pull request. I will review it and merge the changes if they are compatible with the current pipeline
I am recently do some work on your dataset for paper posting, as a beginner. The learning process will continue two moths. My tutor also asked me to do some work on visualization of your data set and formatting of your data set. The work may take two or three months to complete. I will work with visualizations, formatting instructions, and pipline code for handling data submitted to you. Addtional, thank you for your Open source work and DIPS-Plus is a great work.
Thank you, @zhenpingli, for filling me in and for the kind words. I will do my best to support you in the future if you have questions throughout the process. Best of luck in your endeavors!
as we can see in the DISP_Plus, the author said: about make dips dataset #7 https://github.com/BioinfoMachineLearning/DIPS-Plus/issues/7
"deadlock of sorts after a certain number of complexes have been processed." when run for a long time, it will appeare By chance, but for few files , it always run success, so i Split up to process the make_dataset
first mkdir six different fold like tmp1 tmp2 .... mkdir tmp1 than cd in pdb and move the files to the new fold mv
ls | head -200
../tmp6 run make_dataset.py seperate: python3 make_dataset.py project/datasets/DIPS/raw/tmp1 project/datasets/DIPS/interim --num_cpus 24 --source_type rcsb --bound