Open nlisgo opened 1 year ago
Step 1 is partially incorrect for this preprint, because it is on research square, not bioRxiv. Exeter will be providing the XML, and it would be good to confirm whether this should be uploaded to the enhanced-preprint/data repo (under a different folder for the separate DOI prefix).
@fred-atherden This issue template is not adequate for this type of preprint. I will discuss this with @QueenKraken tomorrow. @fred-atherden do you have an ETA on when the XML may become available?
What is the doi for the manuscript on research square?
Added above as well @nlisgo https://doi.org/10.21203/rs.3.rs-2200020/v3
I should be able to provide a MECA package in the next couple of days - Exeter will deliver it later today, but there may potentially be issues (it's the first of its kind under this new process).
@nlisgo, PR for adding this one is up - https://github.com/elifesciences/enhanced-preprints-data/pull/33.
(I also wonder if Steps 3 and 4 need a rethink given Sciety won't have the research square preprint?)
@fred-atherden we are in the process of transitioning away from the sciety docmaps to data-hub. I'm not sure what the implications are for the source of the public reviews given your comment?
Sciety won't have the research square preprint?
We are also moving away from the enhanced-preprints-data git repo but hope to soon provide you with an s3 bucket to upload the meca file too. Can we expected to be able to extract the xml and assets from the meca file with the script: https://github.com/nlisgo/enhanced-preprints-s3-sync/blob/master/scripts/other_extract_mecas.sh
I'm mostly think about these lines: https://github.com/nlisgo/enhanced-preprints-s3-sync/blob/master/scripts/other_extract_mecas.sh
Thanks @nlisgo. I'm not certain what the implications are here either. The docMaps that I've seen (from Sciety and data-hub) have links to webpages on Sciety and hypothes.is and my understanding is that (one of these) is where EPP is pulling the content from, and that this is the output of the Editorial team posting via Kotahi.
Perhaps @BlueReZZ can advise whether @acollings team should still attempt to post public reviews via Kotahi for preprints that aren't on bioRxiv/medRxiv.
Regarding the bash script - the DOI is currently being used to determine and output directory (and some filenames) for the process. bioRxiv/medRxiv DOIs tend to only have one forward slash so the folder structure in enhanced-preprints-data
is currently consistent. However we cannot make this assumption about the DOIs for all preprints we'll be consuming. As I understand it, all DOIs will have at least one forward slash (a prefix assigned to an organisation or publication, such as eLife or cold spring, and a suffix separated by /
), but there can be many more (/
can be added by the publisher in the suffix). Here the DOI here has two - 10.21203/rs.3.rs-2200020/v3
.
If you want to maintain a consistent "folder" structure in the s3 bucket, then you may want to other consider other potential. What I've attempted to do here which is one potential approach is to split any DOIs into a prefix, and then a suffix with the slashes in the suffix replaced.
The other thing to mention - which isn't necessary at this stage but will become so once supplementary files are properly supported on EPP - is that all assets are not currently extracted from the meca file - it looks like only images are.
meca files have all the assets required for the article listed in the manifest.xml
file.
BTW if you are going to continue using the DOI in this script, for this line:
doi=$(cat $tmpDir/$xmlFile | sed 's/xmlns=".*"//g' | xmllint -xpath 'string(/article/front/article-meta/article-id)' -)
Specifying the article-id with the attribute pub-id-type="doi"
would be more robust:
doi=$(cat $tmpDir/$xmlFile | sed 's/xmlns=".*"//g' | xmllint -xpath 'string(/article/front/article-meta/article-id[@pub-id-type="doi"])' -)
Since there can be numerous different article-ids defined here.
Perhaps @BlueReZZ can advise whether @acollings team should still attempt to post public reviews via Kotahi for preprints that aren't on bioRxiv/medRxiv.
Biophysics Colab have been doing this with no problems and I think the UI configurations for the eLife instance of Kotahi are the same. As long as the DOI and the reviews are present they should get posted to hypothesis and Sciety will ingest them.
Thanks Paul!
By the sounds of it, we can completely ignore my comment about Sciety and the steps above are still applicable (aside from step 1).
@BlueReZZ, sorry, just to confirm on this:
As long as the DOI and the reviews are present ...
We have a separate one in the pipeline from arXiv, who mint DOIs via DataCite instead of Crossref. Is that OK as well, or does that cause extra complication?
(and apologies I'm for co-opting this ticket with these questions - feel free to hide my comments Nathan/Ash).
We have a separate one in the pipeline from arXiv, who mint DOIs via DataCite instead of Crossref. Is that OK as well, or does that cause extra complication?
Sciety only supports DOIs minted with CrossRef so arXiv would not currently work. The work to add more preprint servers has planned for Q3 this year but some of it brought forward and now Sciety supports all CrossRef-minting servers like Research Square, OSF etc. but not those using DataCite yet.
Sciety only supports DOIs minted with CrossRef so arXiv would not currently work. The work to add more preprint servers has planned for Q3 this year but some of it brought forward and now Sciety supports all CrossRef-minting servers like Research Square, OSF etc. but not those using DataCite yet.
Thanks Paul. Is there a workaround in the meantime to incorporate reviews for preprints on arXiv?
Thanks Paul. Is there a workaround in the meantime to incorporate reviews for preprints on arXiv?
We'd have to explore this with Mark and the Sciety Team as it will need to be on Sciety to get any further into the EPP chain. They have a discovery session every Tuesday so I'll put it on the agenda to see if there's a workaround for arXiv.
@nlisgo, more comments on the script (or rather, more generic info that might be applicable here) - I don't know if you're thinking about versioning at this stage, and how that might affect where/how assets are being stored, but it's worth being aware (if you aren't already) that different preprint servers have different policies around DOIs - for example, bioRxiv have the same DOI for every version, whereas research square mint a new DOI for each version (https://doi.org/10.21203/rs.3.rs-2200020/v1, https://doi.org/10.21203/rs.3.rs-2200020/v2, https://doi.org/10.21203/rs.3.rs-2200020/v3) - and we should avoid trying to derive some assumed convention or meaning from the DOI string itself (the v3
in this one), as any perceived convention can be changed at any point by those who mint them.
Hope that's helpful!
@nlisgo, more comments on the script (or rather, more generic info that might be applicable here) - I don't know if you're thinking about versioning at this stage, and how that might affect where/how assets are being stored, but it's worth being aware (if you aren't already) that different preprint servers have different policies around DOIs - for example, bioRxiv have the same DOI for every version, whereas research square mint a new DOI for each version (https://doi.org/10.21203/rs.3.rs-2200020/v1, https://doi.org/10.21203/rs.3.rs-2200020/v2, https://doi.org/10.21203/rs.3.rs-2200020/v3) - and we should avoid trying to derive some assumed convention or meaning from the DOI string itself (the
v3
in this one), as any perceived convention can be changed at any point by those who mint them.Hope that's helpful!
Thanks @fred-atherden . I'm just catching up with emails today, and our intention for eLife RPPs is that they would be stored with a prefix more like ${publisher}/${msid}/${version}/
e.g. elife/80494/1/article.meca
or similar. Basically, stored as they would be exposed as determined by the docmap. Other journal/groups policy will hopefully be able to fit into a similar pattern, but we've yet to do proper discovery on that. the import process will then be based on parsing the meca's manifest.xml
to find the article, convert and send to EPP.
Under the interim S3 import scheme (that's about to roll out over the next few days), we're importing any files that are prefix data/
and suffix .xml
. It's not very nuanced, but while we're not currently storing anything but images and XML, it will work effectively, regardless of how many subpaths are there.
Thanks Scott. I think using eLife's own msids is the best approach here so that sounds good. Once you've confirmed these details it will be useful for me to know so that I can let Exeter know how to name/structure any meca package for non-bioRxiv and non-medRxiv preprints (such as this one).
Ticket created to address issue with non-biorxiv manuscripts not displaying: https://github.com/elifesciences/enhanced-preprints-issues/issues/581
I'm having problems posting the reviews:
https://elifesciences.slack.com/archives/C01SV25KNS2/p1680085713142409
Also blocked by issues in https://github.com/elifesciences/enhanced-preprints-biorxiv-xslt/pull/18
Fix for above issues in https://github.com/elifesciences/enhanced-preprints-data/pull/65
Looks like DocMaps isn't returning the reviews:
Although they appear on Sciety - https://sciety.org/articles/activity/10.21203/rs.3.rs-2200020/v3.
@HazalCiplak is that expected at this stage because this is a non-bioRxiv preprint?
(Looks like 86324 also has the same issue)
Hi @fred-atherden,
Thank you for raising this. Yes it is related to non-boiRxiv preprint. We have a related ticket: https://github.com/elifesciences/data-hub-issues/issues/655 but it was blocked because we did not have any examples. Now I am making it as unblock and I will start working on it as soon as possible. I will let you know when development finished.
Nice one - thanks Hazal!
This includes reviews now: https://data-hub-api.elifesciences.org/enhanced-preprints/docmaps/v1/by-publisher/elife/get-by-manuscript-id?manuscript_id=85921 And please let me know if there is any issue you realise in this docmap. Thanks again for raising it!
Nice one - thanks Hazal!
"msas": "Cell Biology", "Neuroscience" "msid": "85921" "version": "1" "preprintDoi": "10.21203/rs.3.rs-2200020/v3" "articleType": "Reviewed Preprint" "status": "Published from the original preprint after peer review and assessment by eLife."
"Reviewed Preprint posted": "2023-04-28" "Sent for peer review": "2023-01-11" "Posted to Research Square": "2023-01-13" (link: "Go to Research Square": "https://www.researchsquare.com/article/rs-2200020/v3")
[PLACE PDF URL HERE WHEN AVAILABLE] See step 7
https://doi.org/10.21203/rs.3.rs-2200020/v3
Step 1. Inform bioRxiv
Who can help: @QueenKraken, @nlisgo, @scottaubrey
or (only one should be ticked. remove other from description.)
Send the following email to Ted and wait for his reply.
Step 2. Create preview of manuscript
Who can help: @fred-atherden, @nlisgo, @scottaubrey
Pull request: https://github.com/elifesciences/enhanced-preprints-data/pull/59
Instructions
``` $ git clone git@github.com:elifesciences/enhanced-preprints-data.git $ cd enhanced-preprints-data $ git checkout -b import-rs.3.rs-2200020 origin/master $ ./scripts/fetch_meca_archive.sh rs.3.rs-2200020 incoming/ $ ./scripts/extract_mecas.sh incoming/ data/ $ rm -rf incoming/ $ git add . $ git commit -m 'import-rs.3.rs-2200020' $ git push -u origin import-rs.3.rs-2200020 ``` Create pull request: https://github.com/elifesciences/enhance/compare/master...import-rs.3.rs-2200020 Merge in after CI passes and reviewing changes. Manuscript should be available for preview shortly afterwards. an example with multiple: ``` $ for doi in 2022.06.17.496451 2022.10.29.514266; do ./scripts/fetch_meca_archive.sh $doi incoming/; done $ ./scripts/extract_mecas.sh incoming/ data/ $ rm -rf incoming/ $ for doi in 2022.06.17.496451 2022.10.29.514266; do git checkout --no-track -b "import-$doi" origin/master; git add data/10.1101/$doi/.; git commit -m "import-$doi"; git push origin "import-$doi"; done; git checkout master; ```Step 3: Awaiting public reviews
Who can help: Editorial team
Example
``` "msas": "Genetics and Genomics", "Neuroscience" "msid": "84628" "version": "1" "preprintDoi": "10.1101/2022.10.28.514241" "articleType": "Reviewed Preprint" "status": "Published from the original preprint after peer review and assessment by eLife." "Reviewed Preprint posted": "2023-01-02" "Sent for peer review": "2022-10-28" "Posted to bioRxiv": "2022-11-21" (link: "Go to bioRxiv": "https://www.biorxiv.org/content/10.1101/2022.10.28.514241v1") Editors: Reviewing Editor Michael B Eisen University of California, Berkeley, United States Senior Editor Michael B Eisen University of California, Berkeley, United States ```Step 4: Deprecated (no longer necessary)
Step 5: Modify manuscripts.json (no PDF)
Pull request: https://github.com/elifesciences/enhanced-preprints-client/pull/667 #enhanced-preprint comment thread: [PLACE LINK TO COMMENT HERE]
Instructions to modify manuscripts.json
- Visit: https://github.com/elifesciences/enhanced-preprints-client/actions/workflows/publish-manuscript.yaml - Click: Run workflow - Complete the form and click Run workflow - A successful run should result in a new workflow at https://github.com/elifesciences/enhanced-preprints-client/pulls Example pull request: https://github.com/elifesciences/enhanced-preprints-client/pull/334/files Once the pull request is merged in it should be available a few minutes later.Request that a doi
Post the following in #enhanced-preprint:
Step 6: Awaiting search reindex
The search reindex is triggered once an hour. We need the reviewed preprint to be indexed as the search application serves the journal homepage.
Additional info
If needed, the jenkins pipeline to reindex search can be triggered sooner. https://alfred.elifesciences.org/job/process/job/process-reindex-reviewed-preprints/Step 7: Published! Request PDF generation
#sciety-general comment thread: [PLACE LINK TO COMMENT HERE]
Post the following to the #enhanced-preprint on slack:
Step 8: Add PDF to git repo
Instructions
Download the PDF and rename to `rs.3.rs-2200020.pdf` Goto: https://github.com/elifesciences/enhanced-preprints-data/upload/master/data/10.21203/rs.3.rs-2200020 Upload the file `rs.3.rs-2200020.pdf` and commit directly to the master branchStep 9: Add PDF url to manuscripts.json
[PLACE LINK TO PULL REQUEST HERE]
Instructions
- Visit: https://github.com/elifesciences/enhanced-preprints-client/actions/workflows/add-pdf-url-to-manuscript.yaml - Click: Run workflow - Complete the form and click Run workflow - A successful run should result in a new workflow at https://github.com/elifesciences/enhanced-preprints-client/pulls Example pull request: https://github.com/elifesciences/enhanced-preprints-client/pull/397/files Once the pull request is merged in it should be available a few minutes later.Step 10: Done!