elifesciences / enhanced-preprints-import

Enhanced Preprints import system
1 stars 0 forks source link

MSID 85921 DOI rs.3.rs-2200020/v3 #61

Open nlisgo opened 1 year ago

nlisgo commented 1 year ago

"msas": "Cell Biology", "Neuroscience" "msid": "85921" "version": "1" "preprintDoi": "10.21203/rs.3.rs-2200020/v3" "articleType": "Reviewed Preprint" "status": "Published from the original preprint after peer review and assessment by eLife."

"Reviewed Preprint posted": "2023-04-28" "Sent for peer review": "2023-01-11" "Posted to Research Square": "2023-01-13" (link: "Go to Research Square": "https://www.researchsquare.com/article/rs-2200020/v3")

[PLACE PDF URL HERE WHEN AVAILABLE] See step 7

https://doi.org/10.21203/rs.3.rs-2200020/v3

Step 1. Inform bioRxiv

Who can help: @QueenKraken, @nlisgo, @scottaubrey

or (only one should be ticked. remove other from description.)

Send the following email to Ted and wait for his reply.

Hi Ted,

Please can you prepare the preprint MECA for 10.21203/rs.3.rs-2200020v3

Thanks

Step 2. Create preview of manuscript

Who can help: @fred-atherden, @nlisgo, @scottaubrey

Pull request: https://github.com/elifesciences/enhanced-preprints-data/pull/59

Instructions ``` $ git clone git@github.com:elifesciences/enhanced-preprints-data.git $ cd enhanced-preprints-data $ git checkout -b import-rs.3.rs-2200020 origin/master $ ./scripts/fetch_meca_archive.sh rs.3.rs-2200020 incoming/ $ ./scripts/extract_mecas.sh incoming/ data/ $ rm -rf incoming/ $ git add . $ git commit -m 'import-rs.3.rs-2200020' $ git push -u origin import-rs.3.rs-2200020 ``` Create pull request: https://github.com/elifesciences/enhance/compare/master...import-rs.3.rs-2200020 Merge in after CI passes and reviewing changes. Manuscript should be available for preview shortly afterwards. an example with multiple: ``` $ for doi in 2022.06.17.496451 2022.10.29.514266; do ./scripts/fetch_meca_archive.sh $doi incoming/; done $ ./scripts/extract_mecas.sh incoming/ data/ $ rm -rf incoming/ $ for doi in 2022.06.17.496451 2022.10.29.514266; do git checkout --no-track -b "import-$doi" origin/master; git add data/10.1101/$doi/.; git commit -m "import-$doi"; git push origin "import-$doi"; done; git checkout master; ```

Step 3: Awaiting public reviews

Who can help: Editorial team

Example ``` "msas": "Genetics and Genomics", "Neuroscience" "msid": "84628" "version": "1" "preprintDoi": "10.1101/2022.10.28.514241" "articleType": "Reviewed Preprint" "status": "Published from the original preprint after peer review and assessment by eLife." "Reviewed Preprint posted": "2023-01-02" "Sent for peer review": "2022-10-28" "Posted to bioRxiv": "2022-11-21" (link: "Go to bioRxiv": "https://www.biorxiv.org/content/10.1101/2022.10.28.514241v1") Editors: Reviewing Editor Michael B Eisen University of California, Berkeley, United States Senior Editor Michael B Eisen University of California, Berkeley, United States ```

Step 4: Deprecated (no longer necessary)

Step 5: Modify manuscripts.json (no PDF)

Pull request: https://github.com/elifesciences/enhanced-preprints-client/pull/667 #enhanced-preprint comment thread: [PLACE LINK TO COMMENT HERE]

Instructions to modify manuscripts.json - Visit: https://github.com/elifesciences/enhanced-preprints-client/actions/workflows/publish-manuscript.yaml - Click: Run workflow - Complete the form and click Run workflow - A successful run should result in a new workflow at https://github.com/elifesciences/enhanced-preprints-client/pulls Example pull request: https://github.com/elifesciences/enhanced-preprints-client/pull/334/files Once the pull request is merged in it should be available a few minutes later.

Request that a doi

Post the following in #enhanced-preprint:

@Fred can you register a doi for https://elifesciences.org/reviewed-preprints/85921

Step 6: Awaiting search reindex

The search reindex is triggered once an hour. We need the reviewed preprint to be indexed as the search application serves the journal homepage.

Additional info If needed, the jenkins pipeline to reindex search can be triggered sooner. https://alfred.elifesciences.org/job/process/job/process-reindex-reviewed-preprints/

Step 7: Published! Request PDF generation

#sciety-general comment thread: [PLACE LINK TO COMMENT HERE]

Post the following to the #enhanced-preprint on slack:

@Ryan Dix-Peek please can you generate a PDF for https://elifesciences.org/reviewed-preprints/85921

Step 8: Add PDF to git repo

Instructions Download the PDF and rename to `rs.3.rs-2200020.pdf` Goto: https://github.com/elifesciences/enhanced-preprints-data/upload/master/data/10.21203/rs.3.rs-2200020 Upload the file `rs.3.rs-2200020.pdf` and commit directly to the master branch

Step 9: Add PDF url to manuscripts.json

[PLACE LINK TO PULL REQUEST HERE]

Instructions - Visit: https://github.com/elifesciences/enhanced-preprints-client/actions/workflows/add-pdf-url-to-manuscript.yaml - Click: Run workflow - Complete the form and click Run workflow - A successful run should result in a new workflow at https://github.com/elifesciences/enhanced-preprints-client/pulls Example pull request: https://github.com/elifesciences/enhanced-preprints-client/pull/397/files Once the pull request is merged in it should be available a few minutes later.

Step 10: Done!

fred-atherden commented 1 year ago

Step 1 is partially incorrect for this preprint, because it is on research square, not bioRxiv. Exeter will be providing the XML, and it would be good to confirm whether this should be uploaded to the enhanced-preprint/data repo (under a different folder for the separate DOI prefix).

nlisgo commented 1 year ago

@fred-atherden This issue template is not adequate for this type of preprint. I will discuss this with @QueenKraken tomorrow. @fred-atherden do you have an ETA on when the XML may become available?

nlisgo commented 1 year ago

What is the doi for the manuscript on research square?

QueenKraken commented 1 year ago

Added above as well @nlisgo https://doi.org/10.21203/rs.3.rs-2200020/v3

fred-atherden commented 1 year ago

I should be able to provide a MECA package in the next couple of days - Exeter will deliver it later today, but there may potentially be issues (it's the first of its kind under this new process).

fred-atherden commented 1 year ago

@nlisgo, PR for adding this one is up - https://github.com/elifesciences/enhanced-preprints-data/pull/33.

fred-atherden commented 1 year ago

(I also wonder if Steps 3 and 4 need a rethink given Sciety won't have the research square preprint?)

nlisgo commented 1 year ago

@fred-atherden we are in the process of transitioning away from the sciety docmaps to data-hub. I'm not sure what the implications are for the source of the public reviews given your comment?

Sciety won't have the research square preprint?

We are also moving away from the enhanced-preprints-data git repo but hope to soon provide you with an s3 bucket to upload the meca file too. Can we expected to be able to extract the xml and assets from the meca file with the script: https://github.com/nlisgo/enhanced-preprints-s3-sync/blob/master/scripts/other_extract_mecas.sh

I'm mostly think about these lines: https://github.com/nlisgo/enhanced-preprints-s3-sync/blob/master/scripts/other_extract_mecas.sh

fred-atherden commented 1 year ago

Thanks @nlisgo. I'm not certain what the implications are here either. The docMaps that I've seen (from Sciety and data-hub) have links to webpages on Sciety and hypothes.is and my understanding is that (one of these) is where EPP is pulling the content from, and that this is the output of the Editorial team posting via Kotahi.

Perhaps @BlueReZZ can advise whether @acollings team should still attempt to post public reviews via Kotahi for preprints that aren't on bioRxiv/medRxiv.


Regarding the bash script - the DOI is currently being used to determine and output directory (and some filenames) for the process. bioRxiv/medRxiv DOIs tend to only have one forward slash so the folder structure in enhanced-preprints-data is currently consistent. However we cannot make this assumption about the DOIs for all preprints we'll be consuming. As I understand it, all DOIs will have at least one forward slash (a prefix assigned to an organisation or publication, such as eLife or cold spring, and a suffix separated by /), but there can be many more (/ can be added by the publisher in the suffix). Here the DOI here has two - 10.21203/rs.3.rs-2200020/v3.

If you want to maintain a consistent "folder" structure in the s3 bucket, then you may want to other consider other potential. What I've attempted to do here which is one potential approach is to split any DOIs into a prefix, and then a suffix with the slashes in the suffix replaced.

The other thing to mention - which isn't necessary at this stage but will become so once supplementary files are properly supported on EPP - is that all assets are not currently extracted from the meca file - it looks like only images are.

meca files have all the assets required for the article listed in the manifest.xml file.

fred-atherden commented 1 year ago

BTW if you are going to continue using the DOI in this script, for this line:

doi=$(cat $tmpDir/$xmlFile | sed 's/xmlns=".*"//g' | xmllint -xpath 'string(/article/front/article-meta/article-id)' -)

Specifying the article-id with the attribute pub-id-type="doi" would be more robust:

doi=$(cat $tmpDir/$xmlFile | sed 's/xmlns=".*"//g' | xmllint -xpath 'string(/article/front/article-meta/article-id[@pub-id-type="doi"])' -)

Since there can be numerous different article-ids defined here.

BlueReZZ commented 1 year ago

Perhaps @BlueReZZ can advise whether @acollings team should still attempt to post public reviews via Kotahi for preprints that aren't on bioRxiv/medRxiv.

Biophysics Colab have been doing this with no problems and I think the UI configurations for the eLife instance of Kotahi are the same. As long as the DOI and the reviews are present they should get posted to hypothesis and Sciety will ingest them.

fred-atherden commented 1 year ago

Thanks Paul!

By the sounds of it, we can completely ignore my comment about Sciety and the steps above are still applicable (aside from step 1).

fred-atherden commented 1 year ago

@BlueReZZ, sorry, just to confirm on this:

As long as the DOI and the reviews are present ...

We have a separate one in the pipeline from arXiv, who mint DOIs via DataCite instead of Crossref. Is that OK as well, or does that cause extra complication?

(and apologies I'm for co-opting this ticket with these questions - feel free to hide my comments Nathan/Ash).

BlueReZZ commented 1 year ago

We have a separate one in the pipeline from arXiv, who mint DOIs via DataCite instead of Crossref. Is that OK as well, or does that cause extra complication?

Sciety only supports DOIs minted with CrossRef so arXiv would not currently work. The work to add more preprint servers has planned for Q3 this year but some of it brought forward and now Sciety supports all CrossRef-minting servers like Research Square, OSF etc. but not those using DataCite yet.

acollings commented 1 year ago

Sciety only supports DOIs minted with CrossRef so arXiv would not currently work. The work to add more preprint servers has planned for Q3 this year but some of it brought forward and now Sciety supports all CrossRef-minting servers like Research Square, OSF etc. but not those using DataCite yet.

Thanks Paul. Is there a workaround in the meantime to incorporate reviews for preprints on arXiv?

BlueReZZ commented 1 year ago

Thanks Paul. Is there a workaround in the meantime to incorporate reviews for preprints on arXiv?

We'd have to explore this with Mark and the Sciety Team as it will need to be on Sciety to get any further into the EPP chain. They have a discovery session every Tuesday so I'll put it on the agenda to see if there's a workaround for arXiv.

fred-atherden commented 1 year ago

@nlisgo, more comments on the script (or rather, more generic info that might be applicable here) - I don't know if you're thinking about versioning at this stage, and how that might affect where/how assets are being stored, but it's worth being aware (if you aren't already) that different preprint servers have different policies around DOIs - for example, bioRxiv have the same DOI for every version, whereas research square mint a new DOI for each version (https://doi.org/10.21203/rs.3.rs-2200020/v1, https://doi.org/10.21203/rs.3.rs-2200020/v2, https://doi.org/10.21203/rs.3.rs-2200020/v3) - and we should avoid trying to derive some assumed convention or meaning from the DOI string itself (the v3 in this one), as any perceived convention can be changed at any point by those who mint them.

Hope that's helpful!

scottaubrey commented 1 year ago

@nlisgo, more comments on the script (or rather, more generic info that might be applicable here) - I don't know if you're thinking about versioning at this stage, and how that might affect where/how assets are being stored, but it's worth being aware (if you aren't already) that different preprint servers have different policies around DOIs - for example, bioRxiv have the same DOI for every version, whereas research square mint a new DOI for each version (https://doi.org/10.21203/rs.3.rs-2200020/v1, https://doi.org/10.21203/rs.3.rs-2200020/v2, https://doi.org/10.21203/rs.3.rs-2200020/v3) - and we should avoid trying to derive some assumed convention or meaning from the DOI string itself (the v3 in this one), as any perceived convention can be changed at any point by those who mint them.

Hope that's helpful!

Thanks @fred-atherden . I'm just catching up with emails today, and our intention for eLife RPPs is that they would be stored with a prefix more like ${publisher}/${msid}/${version}/ e.g. elife/80494/1/article.meca or similar. Basically, stored as they would be exposed as determined by the docmap. Other journal/groups policy will hopefully be able to fit into a similar pattern, but we've yet to do proper discovery on that. the import process will then be based on parsing the meca's manifest.xml to find the article, convert and send to EPP.

Under the interim S3 import scheme (that's about to roll out over the next few days), we're importing any files that are prefix data/ and suffix .xml. It's not very nuanced, but while we're not currently storing anything but images and XML, it will work effectively, regardless of how many subpaths are there.

fred-atherden commented 1 year ago

Thanks Scott. I think using eLife's own msids is the best approach here so that sounds good. Once you've confirmed these details it will be useful for me to know so that I can let Exeter know how to name/structure any meca package for non-bioRxiv and non-medRxiv preprints (such as this one).

nlisgo commented 1 year ago

Ticket created to address issue with non-biorxiv manuscripts not displaying: https://github.com/elifesciences/enhanced-preprints-issues/issues/581

acollings commented 1 year ago

I'm having problems posting the reviews:

https://elifesciences.slack.com/archives/C01SV25KNS2/p1680085713142409

fred-atherden commented 1 year ago

Also blocked by issues in https://github.com/elifesciences/enhanced-preprints-biorxiv-xslt/pull/18

fred-atherden commented 1 year ago

Fix for above issues in https://github.com/elifesciences/enhanced-preprints-data/pull/65

fred-atherden commented 1 year ago

Looks like DocMaps isn't returning the reviews:

https://data-hub-api.elifesciences.org/enhanced-preprints/docmaps/v1/by-publisher/elife/get-by-manuscript-id?manuscript_id=85921

Result ```json { "@context": "https://w3id.org/docmaps/context.jsonld", "type": "docmap", "id": "https://data-hub-api.elifesciences.org/enhanced-preprints/docmaps/v1/by-publisher/elife/get-by-manuscript-id?manuscript_id=85921", "created": "2023-01-11T04:40:52+00:00", "updated": "2023-01-11T04:40:52+00:00", "publisher": { "account": { "id": "https://sciety.org/groups/elife", "service": "https://sciety.org" }, "homepage": "https://elifesciences.org/", "id": "https://elifesciences.org/", "logo": "https://sciety.org/static/groups/elife--b560187e-f2fb-4ff9-a861-a204f3fc0fb0.png", "name": "eLife" }, "first-step": "_:b0", "steps": { "_:b0": { "actions": [ { "participants": [], "outputs": [ { "type": "preprint", "doi": "10.21203/rs.3.rs-2200020/v3" } ] } ], "assertions": [ { "item": { "type": "preprint", "doi": "10.21203/rs.3.rs-2200020/v3" }, "status": "manuscript-published" } ], "inputs": [], "next-step": "_:b1" }, "_:b1": { "actions": [ { "participants": [], "outputs": [ { "identifier": "85921", "versionIdentifier": "1", "type": "preprint", "doi": "10.7554/eLife.85921.1", "license": "http://creativecommons.org/licenses/by/4.0/" } ] } ], "assertions": [ { "item": { "type": "preprint", "doi": "10.21203/rs.3.rs-2200020/v3" }, "status": "under-review", "happened": "2023-01-11T04:40:52+00:00" }, { "item": { "type": "preprint", "doi": "10.7554/eLife.85921.1", "versionIdentifier": "1" }, "status": "draft" } ], "inputs": [{ "type": "preprint", "doi": "10.21203/rs.3.rs-2200020/v3" }], "previous-step": "_:b0" } } } ```

Although they appear on Sciety - https://sciety.org/articles/activity/10.21203/rs.3.rs-2200020/v3.

@HazalCiplak is that expected at this stage because this is a non-bioRxiv preprint?

fred-atherden commented 1 year ago

(Looks like 86324 also has the same issue)

HazalCiplak commented 1 year ago

Hi @fred-atherden,

Thank you for raising this. Yes it is related to non-boiRxiv preprint. We have a related ticket: https://github.com/elifesciences/data-hub-issues/issues/655 but it was blocked because we did not have any examples. Now I am making it as unblock and I will start working on it as soon as possible. I will let you know when development finished.

fred-atherden commented 1 year ago

Nice one - thanks Hazal!

HazalCiplak commented 1 year ago

This includes reviews now: https://data-hub-api.elifesciences.org/enhanced-preprints/docmaps/v1/by-publisher/elife/get-by-manuscript-id?manuscript_id=85921 And please let me know if there is any issue you realise in this docmap. Thanks again for raising it!

fred-atherden commented 1 year ago

Nice one - thanks Hazal!