danieleongari / CURATED-COFs

Clean, Uniform and Refined with Automatic Tracking from Experimental Database (CURATED) COFs
MIT License
34 stars 11 forks source link

add 42 papers 84 COFs #31

Closed mpougin closed 2 years ago

mpougin commented 2 years ago

hello @danieleongari, I finally managed to complete collecting the COFs, as there were still some conflicts to be resolved. Maybe you can have a quick check on my commit, I assume the uniqueness errors mean I should remove the duplicates?

danieleongari commented 2 years ago

Thanks @mpougin, I'll give it a look ASAP, hopefully by the end of this week.

mpougin commented 2 years ago

thanks a lot!

danieleongari commented 2 years ago

Hi, I suggest a number of actions to make it ready for merging:

1 - substitute all the _space_group label with the modern convention in CIFs like we did here: https://github.com/danieleongari/CURATED-COFs/commit/cae0382cc534c1211dfc953451d1f300c0b7ae26 Sorry, I think we should have updated some other code we used for the parsing to avoid this manual correction next time.

2 - Run the checks on your computer, by installing it as written in: https://github.com/danieleongari/CURATED-COFs/blob/master/CONTRIBUTING.md

I run it and goes fine, untill for some reason it breaks by reading a CIF:

mac CURATED-COFs git:(bca1958) pre-commit run --all-files
Fix double quoted strings................................................Passed
Fix End of Files.........................................................Passed
Mixed line ending........................................................Passed
Check that DOIs are unique...............................................Passed
Check that CURATED-COF IDs are unique....................................Passed
Check that all frameworks have a matching CIF............................Passed
Check that CURATED-COF names are unique..................................Passed
Check that CURATED-COF paper ids are consistent..........................Passed
Check that there are no overlapping atoms................................Failed

  File "./.github/validate.py", line 160, in overlapping_atoms
    raise ValueError(f'Unable to parse file {cif}') from exc
ValueError: Unable to parse file cifs/21331N2.cif

NOTE that you can find the functions of these checks here: https://github.com/danieleongari/CURATED-COFs/blob/master/.github/validate.py

3 - Finally, when you push you commit to GitHub it will run the same check plus a similarity check based on the Graph. All instruction of what is running is again in the folder .github/ (@ltalirz was the expert of these kind of things and it is very useful to know how to implement these things), and they are reported in the "Actions" menu (on top, close to "pull request"). Here you can click until you find the alert:

Warning: 21331N2 and 21330N2 have the same structure graph hash
Warning: 21370N2 and 15050N2 have the same structure graph hash
Warning: 21411N2 and 21410N2 have the same structure graph hash
Warning: 21481N2 and 20460N2 have the same structure graph hash
Warning: 21482N2 and 20460N2 have the same structure graph hash
Warning: 21630N2 and 13150N2 have the same structure graph hash
Warning: 21631N2 and 14090N2 have the same structure graph hash
Warning: 21632N2 and 13050N2 have the same structure graph hash
Warning: 21633N2 and 15162N2 have the same structure graph hash

Now, a number of duplicates are found and need to be inspected:

Thanks for your effort, these checks are a bit annoying but they surveil that there is a minimum amount of manual errors.

Daniele

mpougin commented 2 years ago

thanks a lot for your feedback @danieleongari. I will try to edit my PR by the end of the week

danieleongari commented 2 years ago

Hi @mpougin, any follow-up effort to finalize this PR?

mpougin commented 2 years ago

Hello @danieleongari, sorry for the delay. On it now

mpougin commented 2 years ago

hello @danieleongari , I modified the PR accordingly

danieleongari commented 2 years ago

Looks good to me, thanks @mpougin for your work and the patience to check all the red flags!

mpougin commented 2 years ago

Thank you for your help @danieleongari. I will run the n2 screening soon and then I can do the next update in July. Should go much faster in the second round

danieleongari commented 2 years ago

Sure, @mpougin let me know when you have the time to start and I'll send you the latest papers from the twitter bot https://twitter.com/COF_papers