Open petermr opened 4 years ago
Thank you @petermr for summarizing the content of the imported repos.
As far as I can tell the modules within the jumboconverters-foo
repos
are superseeded by the submodules within the jumbo-converters
repo.
Similarly newer versions of the dictionaries within the cml-dictionary-bar
repos are contained within xml-cml.org
.
I thinking to making final commits to the jumboconverters-foo
and cml-dictionary-bar
repos, just adding a README.md with (in case of jumboconverters-foo
) the following content:
This repository is depricated as it's content has been inegrated into https://github.com/BlueObelisk/jumbo-converters .
This repository will remain in read-only state for reference.
and set it to an read-only state by archiving (see at the very bottom of e.g. https://github.com/BlueObelisk/jumboconverters-parent/settings ).
pom.xml
- and .(hg|git)ignore
files.We need to replace Bitbucket URL with the new URLs on github.com/BlueObelisk
and replace the .hgignore
with .gitignore
.
Most parent POMs also set the UCC Repository, which doesn't seem to be any longer available:
<repositories>
<repository>
<id>ucc-repo</id>
<name>UCC Repository</name>
<url>https://maven.ch.cam.ac.uk/m2repo</url>
</repository>
</repositories>
This brings me to the next point:
I believe we need a place to publish SNAPSHOT artifacts so that we can get CI going: we have plenty of Maven dependencies that point to SNAPSHOT versions, which are not available on "Maven Central". Without a Maven-repository from which the (Travis-)CI jobs can pull those artifacts, we would need to resort to building and installing each SNAPSHOT dependency again in downstream projects.
I'm really not a Maven expert, but as far as I can tell publishing SNAPSHOT versions to "Maven Central" is at least discouraged. Also accoording to this
GitHub Packages does not support SNAPSHOT versions of Apache Maven.
Does anyone have a suggestion how to proceed here?
IMHO it would be nice if we could publish SNAPSHOT versions directly from the CI pipeline, however I would like to avoid having to host our own Nexus server.
It seems with a few exceptions (oscar4, oscar4-cli, oscar4-chebi & ChemicalTagger) none of the repositories have a LICENSE.txt file in their root directory.
Can I assume that all Java projects are implicitly using the Apache 2.0 license, as they all inherit this setting from the WWMM Parent POM?
Some, but not all also state the APACHE license in the file-headers. We could fix the missing file-headers using the license-maven-plugin.
Notable exceptions from the Apache license are all the CML dictionaries, schema, conventions, website, etc. which are "CC BY 3.0" and three of the the OSCAR repos, which seem to be under "Artistic-2.0" license.
We could probably host this site directly out of the GitHub repo. The HTML code uses Server-Side-Includes, which are not supported by GitHub-Pages, however we could rework the page into using Jelyll (Ruby based templating engine).
I can have a shot at this at some point, but don't think I'll have time until the summmer.
Interesting to know who holds the xml-cml.org domain.
I'll take the sections separately...
On Mon, Dec 30, 2019 at 7:24 PM Oliver Stueker notifications@github.com wrote:
Archiving superseeded repos
As far as I can tell the modules within the jumboconverters-foo repos are superseeded by the submodules within the jumbo-converters repo.
Sounds right.
Similarly newer versions of the dictionaries within the cml-dictionary-bar repos are contained within xml-cml.org.
I don't think the dictionaries are in active use (though I hope we can develop them), so pick whichever seems more uptodate. The dictionaries will be come much more valuable if we can link them to Wikidata. We have done a lot of this recently and it makes the dictionaries more authoritative.
I thinking to making final commits to the jumboconverters-foo and cml-dictionary-bar repos, just adding a README.md with (in case of jumboconverters-foo) the following content:
Yes, fix typos below
This repository is (deprecated) as (its) content has been (integrated) into https://github.com/BlueObelisk/jumbo-converters .
This repository will remain in read-only state for reference.
and set it to an read-only state by archiving (see at the very bottom of e.g. https://github.com/BlueObelisk/jumboconverters-parent/settings ).
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
On Mon, Dec 30, 2019 at 7:24 PM Oliver Stueker notifications@github.com wrote:
Thank you @petermr https://github.com/petermr for summarizing the content of the imported repos. Archiving superseeded repos
superseded
Updating pom.xml- and .(hg|git)ignore files.
We need to replace Bitbucket URL with the new URLs on github.com/BlueObelisk and replace the .hgignore with .gitignore.
Most parent POMs also set the UCC Repository, which doesn't seem to be any longer available:
ucc-repo UCC Repository https://maven.ch.cam.ac.uk/m2repo I think this could be replaced by Maven Central
This brings me to the next point: CI-CD and publishing to Maven Repos
I believe we need a place to publish SNAPSHOT artifacts so that we can get CI going: we have plenty of Maven dependencies that point to SNAPSHOT versions, which are not available on "Maven Central".
AFAICR Maven Central requires numbered versions (which is a good thing as SNAPSHOT can refer to many versions).
Without a Maven-repository from which the (Travis-)CI jobs can pull those artifacts, we would need to resort to building and installing each SNAPSHOT dependency again in downstream projects.
I'm really not a Maven expert, but as far as I can tell publishing SNAPSHOT versions to "Maven Central" is at least discouraged. Also accoording to this https://help.github.com/en/github/managing-packages-with-github-packages/configuring-apache-maven-for-use-with-github-packages
GitHub Packages does not support SNAPSHOT versions of Apache Maven.
Does anyone have a suggestion how to proceed here?
We should create proper versions. This is only a problem if there are many interdependencies of repos, e.g A>B>C so that if A changes then B and C must be verified. I ran into this problem with AMI which had a stack of nearly 10 and so I bundled them all together.
IMHO it would be nice if we could publish SNAPSHOT versions directly from the CI pipeline, however I would like to avoid having to host our own Nexus server.
Agree with sentiment
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
On Mon, Dec 30, 2019 at 7:24 PM Oliver Stueker notifications@github.com wrote:
Thank you @petermr https://github.com/petermr for summarizing the content of the imported repos.
License
It seems with a few exceptions (oscar4, oscar4-cli, oscar4-chebi & ChemicalTagger) none of the repositories have a LICENSE.txt file in their root directory.
Probably true.
Can I assume that all Java projects are implicitly using the Apache 2.0 license, as they all inherit this setting from the WWMM Parent POM?
Yes. I think all authors came from PMR group or close associates.
Some, but not all also state the APACHE license in the file-headers. We could fix the missing file-headers using the license-maven-plugin.
Agreed
Notable exceptions from the Apache license are all the CML dictionaries, schema, conventions, website, etc. which are "CC BY 3.0" and three of the the OSCAR repos, which seem to be under "Artistic-2.0" license.
We picked Artistic before Apache became common. I doubt there are many authors who would object to a change to Apache. Suggest we post this suggestion on Blue Obelisk and give a deadline after which we convert.
Updating the xml-cml.org website.
We could probably host this site directly out of the GitHub repo. The HTML code uses Server-Side-Includes, which are not supported by GitHub-Pages, however we could rework the page into using Jelyll (Ruby based templating engine).
I can have a shot at this at some point, but don't think I'll have time until the summmer.
Interesting to know who holds the xml-cml.org domain.
I think Henry does.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/BlueObelisk/xml-cml.org/issues/2?email_source=notifications&email_token=AAFTCSYYC4OICU5K3U4EXM3Q3JDG3A5CNFSM4KBK6JFKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEH3AJBY#issuecomment-569771143, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS7J7X7B6ALIEH64YETQ3JDG3ANCNFSM4KBK6JFA .
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
I've added the README.md files with a deprecation message and then archived
those jumboconverters-*
and cml-dictionay-*
repos, where I was very
confident that more recent versions of their content are present in the
jumbo-converters
and xml-cml.org
repos.
Similarly newer versions of the dictionaries within the cml-dictionary-bar repos are contained within xml-cml.org. I don't think the dictionaries are in active use (though I hope we can develop them), so pick whichever seems more uptodate. The dictionaries will be come much more valuable if we can link them to Wikidata. We have done a lot of this recently and it makes the dictionaries more authoritative.
Back in 2015/2016 our group did some work on the CML dictionaries, which are now waiting to be merged in xml-cml.org/#1.
[...] AFAICR Maven Central requires numbered versions (which is a good thing as SNAPSHOT can refer to many versions).
[...] We should create proper versions. This is only a problem if there are many interdependencies of repos, e.g A>B>C so that if A changes then B and C must be verified. I ran into this problem with AMI which had a stack of nearly 10 and so I bundled them all together.
Yes, bundling dependent packages is one solution to this problem, another would be, releasing more frequently: Whenever project "A" implements a new feature or fixes a bug, that project "B" needs, a new release for project A is created. As long as Semantic Versioning is used, the impact for downstream projects should not be dramatic. And if CI pipelines exist, testing happens frequently and problems won't stay hidden very long.
IMHO it would be nice if we could publish SNAPSHOT versions directly from the CI pipeline, however I would like to avoid having to host our own Nexus server.
Agree with sentiment
I'll probably make some experiments with using GitHub-actions & -packages soon, using wwmm-parent, euclid and cmlxom. Staying within the GitHub platform, should make it easy to use the necessary GH_TOKENS for authentication against the Repo.
Can I assume that all Java projects are implicitly using the Apache 2.0 license, as they all inherit this setting from the WWMM Parent POM? Yes. I think all authors came from PMR group or close associates.
Good, Whenever I start working on a repo that is still lacking a LICENSE file, I will create a pull-request adding the Apache license.
Peter, if you don't mind I'll assign those PRs to you so that you can merge them. There won't be merge-conflicts so it will be a simple click.
We picked Artistic before Apache became common. I doubt there are many authors who would object to a change to Apache. Suggest we post this suggestion on Blue Obelisk and give a deadline after which we convert.
As far I could see earlier, the Artistic-2.0 is only used by some OSCAR repos. To me one of the licenses is as good as the other one. I would leave it up to @petermr and @mjw99 to decide whether to change the license or not.
Updating the xml-cml.org website.
We could probably host this site directly out of the GitHub repo. The HTML code uses Server-Side-Includes, which are not supported by GitHub-Pages, however we could rework the page into using Jekyll (Ruby based templating engine).
I can have a shot at this at some point, but don't think I'll have time until the summer.
Interesting to know who holds the xml-cml.org domain.
I think Henry does.
Pinging @hrzepa .
Much of our software is analogous to mines - its value varies according to what the world is interested in. For example if people are interested in extracting data from Gaussian log files, jumbo-converters can do this. There's a cyclic gotcha - people wont mine logfiles unless there is working software and it's a labour of love to write software in advance of demand. What you (Oliver) has done is very valuable - preserving the reserves and making them more accessible. My hope is that if they were displayed again then people might pick them up and use them and start again.
I think the next action is probably to create a spreadsheet/webpage prospectus of what there is, what is does, hopefully an example or two and also preserve the history and authorship.
On Tue, Dec 31, 2019 at 12:15 AM Oliver Stueker notifications@github.com wrote:
Archiving superseded repos
I've added the README.md files with a deprecation message and then archived those jumboconverters- and cml-dictionay- repos, where I was very confident that more recent versions of their content are present in the jumbo-converters and xml-cml.org repos.
Similarly newer versions of the dictionaries within the cml-dictionary-bar repos are contained within xml-cml.org. I don't think the dictionaries are in active use (though I hope we can develop them), so pick whichever seems more uptodate. The dictionaries will be come much more valuable if we can link them to Wikidata. We have done a lot of this recently and it makes the dictionaries more authoritative.
Back in 2015/2016 our group did some work on the CML dictionaries, which are now waiting to be merged in xml-cml.org/#1 https://github.com/BlueObelisk/xml-cml.org/pull/1. CI-CD and publishing to Maven Repos
[...] AFAICR Maven Central requires numbered versions (which is a good thing as SNAPSHOT can refer to many versions).
[...] We should create proper versions. This is only a problem if there are many interdependencies of repos, e.g A>B>C so that if A changes then B and C must be verified. I ran into this problem with AMI which had a stack of nearly 10 and so I bundled them all together.
Yes, bundling dependent packages is one solution to this problem, another would be, releasing more frequently: Whenever project "A" implements a new feature or fixes a bug, that project "B" needs, a new release for project A is created. As long as Semantic Versioning https://semver.org/ is used, the impact for downstream projects should not be dramatic. And if CI pipelines exist, testing happens frequently and problems won't stay hidden very long.
IMHO it would be nice if we could publish SNAPSHOT versions directly from the CI pipeline, however I would like to avoid having to host our own Nexus server.
Agree with sentiment
I'll probably make some experiments with using GitHub-actions & -packages soon, using wwmm-parent, euclid and cmlxom. Staying within the GitHub platform, should make it easy to use the necessary GH_TOKENS for authentication against the Repo. License
Can I assume that all Java projects are implicitly using the Apache 2.0 license, as they all inherit this setting from the WWMM Parent POM? Yes. I think all authors came from PMR group or close associates.
Good, Whenever I start working on a repo that is still lacking a LICENSE file, I will create a pull-request adding the Apache license.
Peter, if you don't mind I'll assign those PRs to you so that you can merge them. There won't be merge-conflicts so it will be a simple click.
We picked Artistic before Apache became common. I doubt there are many authors who would object to a change to Apache. Suggest we post this suggestion on Blue Obelisk and give a deadline after which we convert.
As far I could see earlier, the Artistic-2.0 is only used by some OSCAR repos. To me one of the licenses is as good as the other one. I would leave it up to @petermr https://github.com/petermr and @mjw99 https://github.com/mjw99 to decide whether to change the license or not.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/BlueObelisk/xml-cml.org/issues/2?email_source=notifications&email_token=AAFTCS3EVXF4MNKSN4SHGFLQ3KFLJA5CNFSM4KBK6JFKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEH3PSOY#issuecomment-569833787, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS7GBC6JSE7OQGKIXRTQ3KFLJANCNFSM4KBK6JFA .
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
Imported repos 20191230:
euclid
A Java library for 2D and 3D geometric calculations.
This is fundamental to several other PMR repos, but development is now in the monolithic github.com/petermr/ami3
chemicaltagger
ChemicalTagger is a tool for semantic text-mining in chemistry.
standalone and (I think) currently working well thanks to mjw.
oscar4-cli
A set of small programs to run bits of the OSCAR4 software.
I think this is standalone and working but no evidence.
oscar4
OSCAR (Open Source Chemistry Analysis Routines) is an open source extensible system for the automated annotation of chemistry in scientific articles.
I think this is working well (mjw).
cmlxom
A Java library for processing CML.
XOM is still a widely used Java tool for XML and this should work with any later versions.
xml-cml.org
I am using issues in this issue to discuss general problems. Please feel feel to transfer elsewhere.
jumbo-converters
Converters for legacy to and from CML
A large set of modules for converting legacy ouptut into CML. Anything called jumbo-converters-foo is likely to be a module.
cmllite-validator-code
CML validation is best done via xpath expressions, not XML schema
cmllite-validator-ws
uses XSLT
jumbo6
A java editor/browser for CML. Almost certainly out of date.
chemicaltagger-webapp
No recent PMR knowledge
svg
A XOM for SVG.
Superseded by
svg
inhttp://github.com/petermr/ami3
. In wide use.html
A XOM for HTML.
Superseded by
html
inhttp://github.com/petermr/ami3
. In wide use.acpgeo
No immediate comment
jumboconverters-compchem
reads semi-formatted (lineprinter) output from about 10 packages
Useful but needs editing if the output formats change.
wwmm-pom
jumbo-testutil
Utilities to support unit tests
euclid-testutil
Utilities to support unit tests
jumboconverters-parent
parent module for jumbo-converters
cifxom
XOM for CIF (crystallography) files. I would touch base with COD - I suspect this is obsolete.
jumboconverters-cli
CLI for running jumboconverters
I suggest changing to picocli.net - a much better CLI.
jumboconverters-molecule
reads a variety of legacy molecule formats into CML
jumboconverters-top
toplevel module
jumbo-inchi
no immediate knowledge.
schtml
one of many attempts to get a normalised version of HTML for scientific articles
oscar4-uima
UIMA is an (IBM) Open source tool for running conformance and evaluation operations
pub-crawler
crawler for scientific articles/data
Probably obsolete
oscar4-chebi
ChEBI is an EBI chemical library
Suspect this is obsolete.
http-crawler
???
crystaleye
CIF crawler and database
Now merged with COD, so obsolete.
jumboconverters-*:
jumboconverters-spectrum
converts legacy spectra to cml-spect
jumboconverters-template
a general reader for semi-structured documents (e.g. FORTRAN output)
jumboconverters-react
jumboconverters-crystal
jumboconverters-composite
chemtreebank
???
oscar4-taverna
OSCAR under the Taverna workflow Probably obsolete.
quixote-dicts
dictionaries for the Quixote project
cml-dicts
CML dictionaries
cml-specs
CML specifications
cml-dictionary-*:
cml-dictionary-compchem
compchem dictionary
PMR we now have a new approach to dictionaries in
ami3
cml-dictionary-units-nonsi
cml-dictionary-unit-types
cml-dictionary-compchem-nwchem
cml-dictionary-compchem-gaussian
cml-dictionary-cml-formula
cml-dictionary-cml-name
cml-dictionary-cml
cml-dictionary-units-si
cml-dictionary-cif