Open danielskatz opened 9 years ago
I would suggest adding to https://elixir-registry.cbs.dtu.dk
We have recently started the first-ever official catalog of software at LBNL, from the Computational Research Division. This was the primary initial task of the Software Engineering Management Committee (SEMC), which I led. See: http://crd.lbl.gov/software/ There is an ongoing effort to provide internal catalog functions as well, such as download tracking and the ability to automatically generate reports for the Innovation Partnerships Office (IPO). Software catalogs are intimately connected to issues of open-source and licensing.
I suppose one question would be whether there should be a single catalog, or different catalogs per domain. The ELIXIR registry seems fairly specific to life sciences, although it could certainly serve as a good example for a broader catalog. Similarly, the LBNL CRD software catalog seems specific to software developed at LBNL, but could also serve as a good example/starting point.
However, to propose an answer to my own question, I think a catalog for tools that aid sustainability should probably be multi/cross-disciplinary, since many of the tools would also be themselves. So, the two existing catalogs mentioned here would likely not be sufficient, unless it was possible to extend them significantly. I think a better option may be to use them as a base off which to build.
It is less obvious who would support and maintain a single catalog, but even if there was one CRD and I presume ELIXIR would like to preserve their domain or organization-specific views of it. We are actually creating a simple Django-based backend for the internal functionality I mentioned, which would probably be a good starting point (or at least something to look at) for something more general. I can try and make that relatively clean and open-source itself on Github before I come..
I would like to see some information on a catalog that would help filter the entries so that you could easily see new, popular, etc software, this would reduce the support burden for a catalog since out of date entries would naturally not be seen.
@dangunter: that sounds like a great starting point!
@gridrebel: I think a release or last updated date could certainly be one element of a catalog, in addition to labels based on functionality, relevant disciplines, etc.
I wonder if the fact that general web catalogs have mostly disappeared in favor of search engines is relevant.
I think it is; keeping a catalog up to date is very labor intensive. Apache has projects provide a DOAP description file which they then crawl; funders could require projects to provide that as part of the annual reporting process, then crawl it.
http://oss-watch.ac.uk/resources/doap
As it is projects are required to include the grant number on any publication (including websites, right?) so one might be able to leverage that (some set of pages to be crawled?)
Another option is to think about what would incentivize projects to keep information updated; only thing I can think of is a website for registering your preferred citation for users of the software.
--James
On Sat, Sep 12, 2015 at 6:04 AM, Daniel S. Katz notifications@github.com wrote:
I wonder if the fact that general web catalogs have mostly disappeared in favor of search engines is relevant.
— Reply to this email directly or view it on GitHub https://github.com/danielskatz/WSSSPE/issues/44#issuecomment-139749322.
Yeah, @danielskatz makes a good point... and keeping a catalog maintained would not only be labor-intensive but also potentially difficult, since it would require finding the software in the first place!
I like the DOAP suggestion, as it takes much of the work off the catalog and puts it onto the projects themselves. Then, they would just need to submit an appropriate link to the catalog maintainer (or the whole thing could be automated).
I agree that a preferred citation is one strong incentive, although I imagine that potentially having access to a larger user base (and thus more people to cite the work) would be one as well.
However, it just occurred to me is that for this discussion I've been thinking more about a catalog for sustainable software itself, rather than tools that would support the development of such...
Although catalogs as a way of indexing "everything" have disappeared, catalogs are still alive and well. Amazon, iTunes, the App Store, etc. etc. This is, to me, the appropriate analogy for a software catalog -- "we" are showing our wares (with an "s"!). This is not new -- groups and labs commonly have webpages for such things, and the LBNL/CRD attempt is merely an attempt to broaden and standardize this effort slightly. What is a little different is the realization that we could leverage this catalog for the purposes of aiding with our parochial needs for tracking and reporting downloads (usage), and also gaining a sense of the adoption of software engineering practices across the entire portfolio.
I edit the Astrophysics Source Code Library (ASCL), and in my lightning talk at WSSSPE, am extending an offer of our infrastructure to anyone who wants to use it to build their own software registry/repository, meaning we will give folks a clone of our infrastructure, and then they can change it however they'd like to suit their discipline/needs. The ASCL is built with open source tools that have really large userbases, so it's not hard to find people with skills using these tools, or to develop them. (The ASCL is a completely volunteer effort so this has been important to us.) We'll even host your site if you'd like at no cost!
You can see a mostly-emptied-out-of-our-stuff clone here. In addition to the functionality you can see as a user, there are administrator tools that let you stage a new entry before publishing it, assign a unique identifier to it, edit existing entries, etc., and some simple reports that are essentially ways of getting info out of the database in different formats. One of the ASCL's reports, for example, allows the main indexing service for astrophysics (Astrophysics Data System, or ADS) to pull updates in their preferred format.
If you want more info, please let me know; thanks!
There is a High Performance Math Software Catalog at http://wotug.org/parallel/nhse/rib/repositories/hpc-netlib/catalog/ It is however generated back in 1999.
A 2015-16 NASA software catalog (2nd edition) is available as PDF at https://software.nasa.gov NASA started publishing software catalog only in 2014.
Some universities publish software catalog that they have provided access for faculty/staff/students, for example, https://it.stonybrook.edu/services/software-catalog/
Having been involved in the creation and running of three "general" software catalogues, I can confirm that they are a lot of work to keep updated (we had 2 FTE and it wasn't quite enough). A federated approach spreads the burden, and we did try using DOAP for a while but at the time (six years ago), the tooling wasn't good enough for the end users.
One of the benefits of e.g. the iTunes store / Google Play, is that the developers provide their own information using a set template.
I've looked at previous efforts to establish a repository/registry in astronomy, and one of the problems was too much metadata. Everyone loves to have it, but it's hard to keep a lot of metadata up-to-date, and that -- inability to keep the metadata current -- sank more than one of these other efforts. That's why the ASCL is so light. We have no full-time staff, just two part-time volunteers creating entries and vetting submissions. There are lots of things the ASCL can't do, but one of the things it has been able to do is survive!!
note this NIH workshop report on their efforts to build a software discover index: http://softwarediscoveryindex.org/report/
Also note the metadata harmonization effort in https://www.mozillascience.org/code-as-as-research-object-new-phase
I have accidentally stumbled into a math software service this afternoon and I like it:
http://www.swmath.org
sctchoi, wow! That's fabulous; thank you!
I think that maintenance is one of the biggest challenges in running such catalogs. To run it efficiently, one has to actively engage with developers and encourage them to keep information up-to-date. The value of swMATH is that it's not a catalogue of the software (like e.g. a related ORMS catalogue), but it's a database of citations of mathematical software, drawing information from https://zbmath.org/.
swMATH allows to submit new software package at http://www.swmath.org/contribute/main and to suggest updates of existing entries. For example, I've updated the version number for GAP in summer, and my update has been processed relatively fast. On the other hand, swMATH does not track citations of a particular version of the software, but that's a more global problem since the versioning information in the citation may not be accurate in the first instance.
My suggestion is to recommend to other bibliographical databases, for example, MathSciNet, to treat software citations in a similar manner.
Creating and curating catalogs for software tools that aid sustainability (perhaps categorized by domain, programming languages, architectures, and/or functions, e.g., for code testing, documentation)