ReScience / submissions

ReScience C submissions
28 stars 7 forks source link

Reproducibility of 'Poincaré dodecahedral space parameter estimates' #41

Closed broukema closed 3 years ago

broukema commented 4 years ago

Original article: Roukema, B. F., Z. Buliński, and N. E. Gaudin, 2008, Poincaré dodecahedral space parameter estimates, Astronomy and Astrophysics, 492, 657 arXiv:0807.4260, https://ui.adsabs.harvard.edu/abs/2008A&A...492..657R

PDF URL: https://codeberg.org/boud/RBG08/raw/branch/master/Roukema_ReSciC2020.pdf Metadata URL: https://codeberg.org/boud/RBG08/src/branch/master/metadata.yaml Code URL: https://codeberg.org/boud/0807.4260 (and swh)

Scientific domain: Astronomy (Cosmology) Programming language: C + Fortran 77 Suggested editor: Based on fortran + C + scientific field per the official profiles, @pdebuyl would seem to be the most appropriate

This is Paper number 12 at https://github.com/ReScience/ten-years/issues/1.

broukema commented 4 years ago

After I receive a review (or reviews) on my submission, here are some post-submission thoughts that I could integrate into the text, if the reviewer(s) allowed me to make some changes:

(1) Here's a relevant comment from the Dept of Physics at the Uni of Oxford - https://www2.physics.ox.ac.uk/it-services/where-has-g77-gone-to

Part of the reason for the transition from g77 to gfortran is to make mixing-in with C code simpler, and avoid (most of) the acts of cruel and unusual programming which were previously required to get the compilers' outputs to co-operate. Inevitably, the results of said acts were almost inevitably fragile and non-portable.

So part of the results of this paper can be reworded colourfully by stating that "acts of cruel and unusual programming" that seemed appropriate in 2008 are difficult to update in terms of the current (2020 Debian stable) software environment. In more formal terms, again quoting the same source, the necessary acts were "almost inevitably fragile and non-portable" - which indeed is the case here.

(2) Another useful observation is that although in principle it should have been possible to recompile g77 from source, in the same or a close enough way to which Debian stable did that in 2008, I have not been able to find g77 source code from mainstream sources for that epoch; I only found them for 1996-1998:

khinsen commented 4 years ago

This sounds like a problem that Software Heritage could help with, but I don't know how to proceed, so I asked the question on Twitter.

broukema commented 4 years ago

@khinsen A side issue, which is in fact interesting for reproducibility, is that on Twitter you said "1995-1998", but the pre-0.5.18 files, from 1995, are only diff files:

| g77-0.5.8-0.5.9.diff.gz | 1995-02-21 03:00 | 16K | | g77-0.5.9-0.5.10.diff.gz | 1995-02-22 03:00 | 3.9K | | g77-0.5.10-0.5.11.diff.gz | 1995-02-23 03:00 | 5.0K | | g77-0.5.11-0.5.12.diff.gz | 1995-02-23 03:00 | 1.6K | | g77-0.5.12-0.5.13.diff.gz | 1995-02-25 03:00 | 10K | | g77-0.5.14-0.5.15.diff.gz | 1995-05-19 03:00 | 12K | | g77-0.5.15-0.5.16.diff.gz | 1995-08-30 03:00 | 168K | | g77-0.5.16-0.5.17.diff.gz | 1995-11-19 03:00 | 36K | | g77-0.5.17-0.5.18.diff.gz | 1996-04-01 03:00 | 436K | | g77-0.5.18-0.5.19.diff.gz | 1996-12-15 03:00 | 94K |

Unless you have g77-0.5.8 with at most very tiny changes, the 1995 diff files only give clues to the full source code and in practical terms are close to useless. The person maintaining this at the time presumably wanted to minimise wasting bandwidth and disk space, and wanted to make it straightforward for someone updating to verify what was changing. It didn't occur to him/her that the source of g77-0.5.8 itself might be unavailable.

That's why I deliberately said "1996-1998". :) This doesn't change the curious difficulty in finding the 2007-2008-ish sources. Obviously, the most likely explanation is people being busy, wanting to move forward, and not thinking that long-term reproducibility might be important.

khinsen commented 4 years ago

@broukema Your explanation is probably correct, but I still find it surprising that a piece of software that once was widely used and part of a flourishing project (GCC) can disappear so easily.

broukema commented 4 years ago

@khinsen I fully agree that it's surprising. Even if someone eventually finds the sources, I think the as-far-as-we-know-disappearance of the g77 source would probably be worth adding to the abstract.

broukema commented 4 years ago

@jcburley Do you know of any g77 source tarballs for around the years 2006-2008 located on a reputable server? Would you be willing to put these online if you still have them available? Do you wish to comment publicly on the apparent disappearance of the sources? Scroll up for the context of the reproducibility of scientific research papers.

jcburley commented 4 years ago

@jcburley Do you know of any g77 source tarballs for around the years 2006-2008 located on a reputable server? Would you be willing to put these online if you still have them available? Do you wish to comment publicly on the apparent disappearance of the sources? Scroll up for the context of the reproducibility of scientific research papers.

g77 wouldn't have had separate, maintained tarballs in 2006-2008, as gfortran replaced it in the GCC distribution.

If you can find a GCC tarball prior to 4.0 (say, 3.x), that should include the g77 sources, which (IIRC) were shipped with GCC, but I can't remember (nor find info easily on) just which GCC version the first to incorporate g77 versus g77 being provided as a sort of add-on or plugin tarball.

In the late '90s, the EGCS project forked from GCC and later replaced GCC. During this time, I think g77 was included (rather than a separate tarball). So, that's another possible source for g77 source code: an EGCS tarball.

I left some information at http://www.kilmnj.com/, though that site is very, very old.

As for g77-0.5.8 or earlier tarballs, circa 1995 or prior, it is possible I might have them lying around on an old filesystem somewhere, but I doubt they'll be of much use. Let me know if you want me to do some spelunking in that regard.

I assume you've checked with the FSF and asked about their archives; if not, definitely do so. They used to keep lots of older versions of distributions on "Savannah", I think is/was the name of the system; possibly it, or whatever has replaced it, has older GCC/EGCS/g77 distributions on it.

Sorry I can't be of much more help!

khinsen commented 4 years ago

Thanks @jcburley, that was helpful. The GNU FTP server has gcc-3.4.6 which contains g77: https://ftp.gnu.org/gnu/gcc/gcc-3.4.6/

broukema commented 4 years ago

@jcburley Thanks! :) gcc-3.4.6 looks like exactly what is relevant in this particular case. So the only thing that was missing was user knowledge (by me, anyway) about the relations between g77, gcc and gfortran in terms of tarball archiving.

broukema commented 4 years ago

@pdebuyl @khinsen @oliviaguest @rougier I updated the paper to take into account the points above. As you can see briefly stated in the paper, I had a go at compiling g77-3.4.6 within gcc-3.4.6, without success. Even if it had succeeded, I'm not sure if it would have been worth it to continue in that direction.

Any chance of this submission being allocated a reviewer? My paper seems to be the only one lacking colourful tags :) - even the '01 Request' tag is missing...

oliviaguest commented 4 years ago

I'm not sure what's going on @broukema — @khinsen are you taking care of this? If so, I respect your judgment and will let you deal with this if not, please let me know how/if I can help.

khinsen commented 4 years ago

@oliviaguest I jumped into this thread because of the g77 discussion which is related to a problem I had to deal with myself a while ago. I was under the impression that a review was going on already, which apparently isn't the case! Sorry @broukema!

khinsen commented 4 years ago

This is basically a report on a failed reproduction attempt, for technical reasons (C-Fortran interfacing, g77). So... no need to understand anything about astronomy! Which is good news because ReScience is a bit short of astronomers these days.

@pdebuyl Would you be available to edit this contribution? I'd then propose myself as a reviewer, since I have had to deal with similar issues myself in the past.

pdebuyl commented 4 years ago

ok :-)

@khinsen, would you review this article ?

khinsen commented 4 years ago

@pdebuyl Now that you ask ... ;-) With pleasure!

khinsen commented 4 years ago

@pdebuyl @broukema Here comes my review!

I enjoyed reading this paper, which stands out in several respects:

  1. The work to be reproduced was carefully done in 2008 to facilitate reproduction, a rare choice at the time.

  2. The paper does note merely describe a reproduction attempt, but also provides a thoughtful analysis of why it failed. These causes are still with us, and software written in 2020 can easily suffer the same fate if no precautions are taken.

  3. The paper points out issues that are rarely discussed in the context of reproducibility, such as access restrictions to published resources for residents of certain countries.

I have two comments/suggestions:

  1. The statement that "several popular GIT repository servers block access to scientists and other residents of several territories" is not backed up by reference 11 (the GitHub terms of service). First, reference 11 is only about one service, not several. Second, whereas the terms of service do state that "users may only access and use GitHub.com in compliance with applicable law, including U.S. export control and sanctions laws", this rule is not enforced in any way at this time, to the best of my knowledge. Anyone anywhere in the world has read access to public GitHub repositories, unless that access is blocked at some other level. So the correct statement should be "at least one popular GIT repository server reserves the right to block access..." (which is of course bad enough).

  2. The author suggests that a more serious resurrection attempt of the code should best proceed by porting the Fortran 77 code to Fortran 2008, in order to profit from the standardized C interface that newer Fortran standards include. If the goal is merely resurrection (as opposed to continued development), I would try something more lightweight: convert the Fortran code to C using f2c. In my experience with similar issues, f2c's C interfacing conventions are very close to those of g77. A side benefit would be a simplification of the build process for users, who would not require any Fortran compiler.

khinsen commented 4 years ago

I forgot a cosmetic detail: according to ReScience C conventions, the article title should be prefixed by "[¬Rp]" (for "failed reproduction attempt").

pdebuyl commented 4 years ago

Thank you @khinsen for the review.

@broukema the requests from @khinsen are rather minor. In the case of "1" though, I believe that it requires an update to the paper. In the case of "2", there is more room for interpretation of course.

Let me know about your reply.

broukema commented 4 years ago

Dear Pierre, Konrad,

Thanks for your comments. I have revised the article in response to the three comments.

  1. Git repository bans: I've updated the comment on git repositories being blocked with some wording changes and more references, adding direct evidence of the blocks. I've included direct statements by Github, Atlassian and Gitlab personnel about blocking access based on geographical location; and media evidence of software developers being blocked. I've adjusted the date to 2018 and 2019 to match the dates in the references; and I described the 2020 situation as "presumably" continuing. I've included a quote by the IASGE team, which has overlapping concerns with those of the reproducibility community.

  2. f2c: This is a useful suggestion that I hadn't thought of - thanks! If/when I (or someone else) get(s) around to either modernising the code or resurrecting it as a check against alternative code, that will be a useful alternative to consider. I've added a brief comment on the idea.

  3. Title - I inserted [¬Rp] - the LaTeXing script seems to have handled the ¬ = '\xac' character without complaining.

Please find the revised pdf at:

The differences can be seen by cloning the source and doing a git diff:

git clone https://github.com/broukema/RBG08_rep
cd RBG08_rep
git diff 4d76925..d89795b

or through a web interface:

Cheers Boud

broukema commented 4 years ago

Vicky Steeves - Project Lead on IASGE - says that the block is still in place: https://octodon.social/@vickysteeves/104421428720311912

khinsen commented 4 years ago

@broukema Thanks for the update. I appreciate in particular the added reference on point 1, and then again in particular the work by IASGE, who I had never heard of before.

However, after reading all those references, my impression remains that "Anyone anywhere in the world has read access to public GitHub repositories, unless that access is blocked at some other level." All the reports about blocked access refer either to private repositories, or to contributing to public ones. What the services seem to block is the access to accounts, not to publicly visible contents. Which, again, is bad enough and needs a lot more discussion in the scientific community (I hope IASGE will contribute to that), but it's not the same as blocking all access.

Finally, shifting to nitpicking mode, the author of ref. 10 is displayed as "I. S. Council".

pdebuyl commented 4 years ago

@khinsen even though we cannot (from France or Belgium) verify the details of the bans themselves, the restriction of developer accounts are referenced in the article. For simpler read-access, the terms of services mention limitation on exports and the principle stands.

Do you accept the article modulo the correction for Ref. 10?

broukema commented 4 years ago

@khinsen It's clear that Github claims that the public repositories remain unblocked; I haven't seen a similar nuance claimed by Atlassian (Bitbucket); and I didn't find sources for the details of the Gitlab ban. Vicky Steeves added some more comments on Slack and Gitlab on the Fediverse thread: https://octodon.social/@vickysteeves/104421428720311912 (You can create an account on any Fediverse (Mastodon, Pleroma, GNU Social, Diaspora*) server to participate in the discussion - choose a community server that you are comfortable with - or set up your own server with your preferred fork or your preferred Activitypub compatible software package.)

I think that concrete peer-reviewed type research into the bans, along the lines of the Berkman Klein Center for Internet & Society Wikipedia Censorship research, would be interesting (with the twist that the government blocking access to citizens of X is not the government of X) - but it's beyond the scope of this paper. Anecdotes (or impressions) and media interviews don't carry the same weight as peer-reviewed research.

I've inserted the word "partially", which I agree is justified given the sources.

And thanks for spotting the author error.

@pdebuyl, @khinsen - I have uploaded version 216d67f1 of my paper with these changes:

https://github.com/broukema/RBG08_rep/commit/216d67f148d59716118d804cacd1d23aae6793b3

khinsen commented 4 years ago

@pdebuyl First things first: I am happy with the current state of the paper!

@broukema Thanks for the updates. I certainly agree that this question is outside of the scope of this paper. My intention was not to see it treated it more exhaustively, but to make sure that readers don't get a wrong impression.

broukema commented 4 years ago

@khinsen "My intention was ... to make sure that readers don't get a wrong impression." Sure - I appreciate having an alert reviewer - I don't like having my name attached to statements that could be misleading.

What's interesting at the meta-level is that this public, git-repository-issue based refereeing procedure allows the subtlety of Wikipedia-style improvement in the quality of the produced text, while differing in the sense that once the article is published, it's a fixed version of record, in contrast to Wikipedia articles, in which articles that have too much popular interest and too few serious maintainers can have quality that decays with time. Long-term preservation of the reviewing/editing procedure here at ReScience C is not guaranteed (as far as I know), in contrast to Wikipedia, but still much better than the case in traditional journals, in which case the reviewing/editing procedure is private and unlikely to ever become public unless there's a leak or a legal case or some similar highly exceptional situation. I also assume that long-term preservation of the reviewing/editing procedures - i.e. some sort of export of the full ReScience C issues and other structure beyond the git repositories - scholarly ephemera - is something aimed for in the long term. Preservation of the git repositories themselves is trivial - if each editor keeps his/her repository up-to-date then there'll be a high level of redundancy. A temporary hack for the ephemera would be to run some sort of spider script that downloads and archives all the html content of ReScience C, e.g. once a day or once a week. Or request automatic html archiving at Archive.org, which I think is generally willing to cooperate with reasonable requests for regular archiving. Anyway, these are all meta issues beyond the topic of this article. :)

pdebuyl commented 4 years ago

Thank you @khinsen and @broukema

I will proceed with publication in the coming days. @broukema at some point I will file a pull request (or send a patch by email) to update the manuscript with the metadata (DOI and dates).

I opened an issue to track the question of archival of github metadata https://github.com/ReScience/ReScience/issues/90

broukema commented 3 years ago

@pdebuyl Thanks. Either a pull request or a patch by email would be fine.

pdebuyl commented 3 years ago

Hi @broukema

Sorry for the delay to publication, your paper is not published :-)

Thanks @khinsen for the review.

pdebuyl commented 3 years ago

"not published" -> "now published", sorry

pdebuyl commented 3 years ago

Link https://doi.org/10.5281/zenodo.3943750

The paper will appear on the webpage of ReScience after merge of the bibliography request and update of the website.

broukema commented 3 years ago

hi @pdebuyl I missed the 5-days-ago pull request, sorry. I merged your pull requests and a proofread commit. Since ReScience C presumably does not have professional (paid) proofreaders (organisationally/legally difficult, given the ReScience C zero budget Rougier, Hinsen et al 2017; Rougier & Hansen 2019), I hope that this will be accepted by the Editor. :)

This raises a minor point possibly overlooked in the two above papers about ReScience C - reviewers are not normally expected to bother about typesetting and style details - they are expected to judge the scientific validity; notes on style, English quality, and typesetting improvements are a minor option in a review. Given the default recommendation to use LaTeX, this may not be a big problem in practice, but could become bigger if authors used to non-LaTeX formats or less experienced in article authorship and scientific English submit articles. This implies that a higher burden of document typesetting/proofreading on author(s) + reviewer(s) + editor(s) exists in this situation than in a funded journal. My impression is that, at least so far, this is not a big problem in ReScience C.

pdebuyl commented 3 years ago

Hi @broukema the paper is published already. Regarding the doi's , I thought that it was intentional from you to prefer links to ADS or to arXiv instead of the doi.

We had a case of "typo-fixing" paper correction recently, I'll look into that.

ReScience has no financial resources but I have not found proofreading and typesetting to be more convincing in traditional journals.

pdebuyl commented 3 years ago

PS: regarding update of the paper I'll wait to see how it turns out, as there is an issue with pre-reservation of DOIS (see https://github.com/ReScience/ReScience/issues/91 )

broukema commented 3 years ago

hi @pdebuyl

Live URLs in references:

You're an editor of ReScience C, not me; and the oadoi.org vs dx.doi.org issue is a broader issue, not specifically for my paper alone. It might be better to first raise the issue for discussion among the editors and decide whether this should be an author decision, or if there should be a ReScience C policy either enforcing or banning one option or the other.

"proofreading and typesetting to be more convincing in traditional journals."

Astronomy journals are usually quite good at this, and tend to identify small errors that can confuse or annoy readers but were missed by the authors. Sometimes the proofreaders/typesetters insert errors, but it's rare.

Version of record:

Looking at your list at https://github.com/ReScience/ReScience/issues/91 (1) ... (13), it's hard to see the proofreading changes in this paper qualify under any of those. From a printing era point of view, making a few links properly clickable is meaningless; and fixing some hyphens and an en dash will only affect novice readers whose lack of knowledge in *nix compiling would probably make the sentence affected difficult to understand either with or without the correction.

Bottom line: I'm leaving this to editorial discretion. :)

pdebuyl commented 3 years ago

Thank you for the feedback. Regarding ADS, I was confused because the bibfile has the ADS informational entries from ADS.

Concerning the correction, the only issue is to make the upload on zenodo in a way that the doi of both version are linked through zenodo's joined doi feature, in combination with the fact that we need to reserve the doi to include it in the paper.

khinsen commented 3 years ago

Jumping in: I got an answer from Zenodo and will try the procedure tomorrow with our first paper update. Then I will report back here.

khinsen commented 3 years ago

@pdebuyl See here: https://github.com/ReScience/ReScience/issues/91#issuecomment-659228614

pdebuyl commented 3 years ago

Thank you @khinsen I'll proceed in the same way. @broukema this time I'll ask you for a proofread before finalizing the submission.

broukema commented 3 years ago

@pdebuyl

Please let me know if there are any more changes (commits) you need beyond commit d4cc1fc:

pdebuyl commented 3 years ago

Hi @broukema I merged your update. I attach the pdf to this message so that you can check it before it goes to zenodo. The pdf already has the new reserved DOI. Roukema_ReSciC2020.pdf

broukema commented 3 years ago

@pdebuyl Please go ahead and publish!

pdebuyl commented 3 years ago

The update is live at https://doi.org/10.5281/zenodo.3956058

PR pending at your repo, ReScience/articles and soon the website (which is waiting for more PRs).

broukema commented 3 years ago

https://doi.org/10.5281/zenodo.3956058 is was not yet live - I assume there's some delay between a DOI provider issuing a DOI and DOI resolvers knowing/agreeing that the identifier is valid. The Zenodo link is live: https://zenodo.org/record/3956058 .

I merged the pull request - thanks. :)

broukema commented 3 years ago

Submitted to ArXiv with https://arxiv.org/list/cs.DL/recent as a primary classification and https://arXiv.org/list/astro-ph.CO/recent as a secondary classification. This should appear on ArXiv on Monday 27 July 2020 if the moderator(s) approve it.

pdebuyl commented 3 years ago

I'll close the issue when you update us about the arxiv version, it is interesting to know how it goes through.

broukema commented 3 years ago

Well, a paper that I hope should satisfy modern standards of reproducibility did just get through ArXiv:

What the reviewer(s) think remains to be seen, though. The national government health agencies that apparently found a cure against Poisson statistics might not be happy with the paper, but hopefully they won't get to choose the reviewer(s).

Anyway, I'll hopefully update this on Monday. :)

broukema commented 3 years ago

The ArXiv moderators have put this paper (ReScience C 6 (2020) #41) on hold: https://arxiv.org/help/submit_status#on_hold . See the link for the possible reasons. The decision could take a few days. If there's a rejection, then I'll propose a draft email for the appeal here, because this most likely concerns the reputation of ReScience C as a scientific research journal, not just my paper in particular.

Since the moderators are reasonably likely to read the discussion here, maybe it would be a good idea to update the 03 - accepted and 04 -published labels, to make the article's status clearer.

I've also just noticed at https://rescience.github.io/read/ another point about the article bibliometry parameters. Feel free to branch this off as a separate issue. Most journals' "issue" number is redundant, and is not normally used in bibliographies (at least in astronomy). Having just two numbers: volume and page number, reduces the chance of ambiguity. I'm speculating that this is the reason why issue numbers are redundant, but still useful. The issue numbers matter for physical libraries and physically printed journals - "We didn't receive issue 23! Someone removed issue 7 from the library!" But even with BibTeX/Biber references, errors are still made in identifying articles. It's easier to check two numbers than three. Looking at https://rescience.github.io/read, it is clear that the issue number is not redundant.

The Zenodo meta data for my article presently say "Published in: ReScience C: 6 pp. #11 (1). Given that there also exists NeurIPS 2019 Reproducibility Challenge Koustuv Sinha et al, https://zenodo.org/record/3818627 which says "Published in: ReScience C: 6 pp. #11 (2)" , the issue number is needed. (Even though in practice, the DOI and Zenodo ID will probably be the only IDs that really get used by people wishing to read the article.)

There are some journals who have modified their article ID system once in their history, but obviously, it's best to do this very rarely - once in a century or so :). For example, Annalen der Physik reset its volume numbers to 1 in 1992 (look for ADP and ANP). The change I would propose would be to make the issue numbers redundant. Don't remove them, just make them unnecessary for identifying the article. So for volume 6, right now all 24 articles would have 24 different "page" numbers. So mine would probably become 6, 2, #24, where "2," is redundant. Doing this with backdating wouldn't be a problem: articles published prior to the change would have multiple (two) equivalent overall IDs; articles after the change would only follow the new system. Just a suggestion...

pdebuyl commented 3 years ago

Thank you for the update. I would be curious to know the reason and, as you write, it is important for ReScience to be valued by the wider scientific community.

broukema commented 3 years ago

@pdebuyl Thanks for updating the tags! :)

We'll only get a reason if the article is rejected - so I'm hoping that we don't get a specific reason for the delay... :) For the moment the article is still on hold .

broukema commented 3 years ago

@rougier I didn't have any problems literally submitting this paper to ArXiv, it's rather a problem of having the paper accepted. The paper has been put on hold . The reasons could "[range] from questions about proper classification, pending moderator approval, presentation issues, copyrighted PDF, etc., to editorial concerns." An oversize submission is clearly not the reason, since the paper is just a plain text single .tex file (+ style file). "Proper classification" is usually dealt with automatically - the moderators (unpaid volunteers) add a category/subcategory or shift the category/subcategory without discussion with the submitter. "Copyrighted pdf" is unlikely - it's clear that the paper is CC-BY and the text is fully original. "Presentation issues" - doesn't sound likely.

So my guess is that the question is in the group of reasons "moderator approval/editorial concerns" - whether ReScience C counts as a research journal and this paper as a research paper.

The fact that we're in Northern temperate zone academic summer holidays most likely adds to the delay.

Anyway, for the moment we still have to be patient. This is the first time I've ever had a paper on hold at ArXiv for more than 1-2 working days.

As I say above, I hope that we will not get a reason for the delay. If we do get a reason, then that will mean a rejection. If there's a rejection, then I'll invite discussion here on whether to appeal and what to put in the appeal message - see https://arxiv.org/help/moderation .