PathwayCommons / factoid

A project to capture biological pathway data from academic papers
https://biofactoid.org
MIT License
27 stars 7 forks source link

Bulk download includes junk documents #828

Closed maxkfranz closed 3 years ago

maxkfranz commented 3 years ago

The bulk download includes documents that should be excluded.

@jvwong, would you list out the status values that should be included (submitted etc.)?

For example, here is one document that has status: 'trashed'. It's also empty w.r.t. elements.

{
  "id": "7c06498a-7a4d-4a4c-ab3e-6ae82e8d869e",
  "secret": "read-only",
  "organisms": [],
  "elements": [],
  "publicUrl": "/document/7c06498a-7a4d-4a4c-ab3e-6ae82e8d869e",
  "privateUrl": "/document/7c06498a-7a4d-4a4c-ab3e-6ae82e8d869e",
  "citation": {
    "title": "A Family of Argonaute-Interacting Proteins Gates Nuclear RNAi.",
    "authors": {
      "abbreviation": "Alexandra Lewis, Ahmet C Berkyurek, Andre Greiner, ..., Thomas F Duchaine",
      "contacts": [
        {
          "name": "Thomas F Duchaine",
          "email": [
            "thomas.duchaine@mcgill.ca"
          ]
        }
      ],
      "authorList": [
        {
          "name": "Alexandra Lewis",
          "email": null,
          "abbrevName": "Lewis A",
          "isCollectiveName": false
        },
        {
          "name": "Ahmet C Berkyurek",
          "email": null,
          "abbrevName": "Berkyurek AC",
          "isCollectiveName": false
        },
        {
          "name": "Andre Greiner",
          "email": null,
          "abbrevName": "Greiner A",
          "isCollectiveName": false
        },
        {
          "name": "Ahilya N Sawh",
          "email": null,
          "abbrevName": "Sawh AN",
          "isCollectiveName": false
        },
        {
          "name": "Ajay Vashisht",
          "email": null,
          "abbrevName": "Vashisht A",
          "isCollectiveName": false
        },
        {
          "name": "Stephanie Merrett",
          "email": null,
          "abbrevName": "Merrett S",
          "isCollectiveName": false
        },
        {
          "name": "Mathieu N Flamand",
          "email": null,
          "abbrevName": "Flamand MN",
          "isCollectiveName": false
        },
        {
          "name": "James Wohlschlegel",
          "email": null,
          "abbrevName": "Wohlschlegel J",
          "isCollectiveName": false
        },
        {
          "name": "Mihail Sarov",
          "email": null,
          "abbrevName": "Sarov M",
          "isCollectiveName": false
        },
        {
          "name": "Eric A Miska",
          "email": null,
          "abbrevName": "Miska EA",
          "isCollectiveName": false
        },
        {
          "name": "Thomas F Duchaine",
          "email": "thomas.duchaine@mcgill.ca",
          "abbrevName": "Duchaine TF",
          "isCollectiveName": false
        }
      ]
    },
    "reference": "Mol. Cell (2020)",
    "abstract": "Nuclear RNA interference (RNAi) pathways work together with histone modifications to regulate gene expression and enact an adaptive response to transposable RNA elements. In the germline, nuclear RNAi can lead to trans-generational epigenetic inheritance (TEI) of gene silencing. We identified and characterized a family of nuclear Argonaute-interacting proteins (ENRIs) that control the strength and target specificity of nuclear RNAi in C. elegans, ensuring faithful inheritance of epigenetic memories. ENRI-1/2 prevent misloading of the nuclear Argonaute NRDE-3 with small RNAs that normally effect maternal piRNAs, which prevents precocious nuclear translocation of NRDE-3 in the early embryo. Additionally, they are negative regulators of nuclear RNAi triggered from exogenous sources. Loss of ENRI-3, an unstable protein expressed mostly in the male germline, misdirects the RNAi response to transposable elements and impairs TEI. The ENRIs determine the potency and specificity of nuclear RNAi responses by gating small RNAs into specific nuclear Argonautes.",
    "pmid": "32348780",
    "doi": "10.1016/j.molcel.2020.04.007"
  },
  "text": "",
  "article": {
    "MedlineCitation": {
      "Article": {
        "Abstract": "Nuclear RNA interference (RNAi) pathways work together with histone modifications to regulate gene expression and enact an adaptive response to transposable RNA elements. In the germline, nuclear RNAi can lead to trans-generational epigenetic inheritance (TEI) of gene silencing. We identified and characterized a family of nuclear Argonaute-interacting proteins (ENRIs) that control the strength and target specificity of nuclear RNAi in C. elegans, ensuring faithful inheritance of epigenetic memories. ENRI-1/2 prevent misloading of the nuclear Argonaute NRDE-3 with small RNAs that normally effect maternal piRNAs, which prevents precocious nuclear translocation of NRDE-3 in the early embryo. Additionally, they are negative regulators of nuclear RNAi triggered from exogenous sources. Loss of ENRI-3, an unstable protein expressed mostly in the male germline, misdirects the RNAi response to transposable elements and impairs TEI. The ENRIs determine the potency and specificity of nuclear RNAi responses by gating small RNAs into specific nuclear Argonautes.",
        "ArticleTitle": "A Family of Argonaute-Interacting Proteins Gates Nuclear RNAi.",
        "AuthorList": [
          {
            "AffiliationInfo": [
              {
                "Affiliation": "Department of Biochemistry & Goodman Cancer Research Centre, McGill University, Montréal, QC H3A 1A3, Canada.",
                "email": null
              }
            ],
            "CollectiveName": null,
            "ForeName": "Alexandra",
            "Identifier": null,
            "Initials": "A",
            "LastName": "Lewis"
          },
          {
            "AffiliationInfo": [
              {
                "Affiliation": "Gurdon Institute, University of Cambridge, Cambridge CB2 1QN, UK.",
                "email": null
              }
            ],
            "CollectiveName": null,
            "ForeName": "Ahmet C",
            "Identifier": null,
            "Initials": "AC",
            "LastName": "Berkyurek"
          },
          {
            "AffiliationInfo": [
              {
                "Affiliation": "Molecular Cell Biology and Genetics, Max Planck Institute, 01307 Dresden, Germany.",
                "email": null
              }
            ],
            "CollectiveName": null,
            "ForeName": "Andre",
            "Identifier": null,
            "Initials": "A",
            "LastName": "Greiner"
          },
          {
            "AffiliationInfo": [
              {
                "Affiliation": "Department of Biochemistry & Goodman Cancer Research Centre, McGill University, Montréal, QC H3A 1A3, Canada.",
                "email": null
              }
            ],
            "CollectiveName": null,
            "ForeName": "Ahilya N",
            "Identifier": null,
            "Initials": "AN",
            "LastName": "Sawh"
          },
          {
            "AffiliationInfo": [
              {
                "Affiliation": "Department of Biological Chemistry, David Geffen School of Medicine at UCLA, Los Angeles, CA 90095, USA.",
                "email": null
              }
            ],
            "CollectiveName": null,
            "ForeName": "Ajay",
            "Identifier": null,
            "Initials": "A",
            "LastName": "Vashisht"
          },
          {
            "AffiliationInfo": [
              {
                "Affiliation": "Molecular Cell Biology and Genetics, Max Planck Institute, 01307 Dresden, Germany.",
                "email": null
              }
            ],
            "CollectiveName": null,
            "ForeName": "Stephanie",
            "Identifier": null,
            "Initials": "S",
            "LastName": "Merrett"
          },
          {
            "AffiliationInfo": [
              {
                "Affiliation": "Department of Biochemistry & Goodman Cancer Research Centre, McGill University, Montréal, QC H3A 1A3, Canada.",
                "email": null
              }
            ],
            "CollectiveName": null,
            "ForeName": "Mathieu N",
            "Identifier": null,
            "Initials": "MN",
            "LastName": "Flamand"
          },
          {
            "AffiliationInfo": [
              {
                "Affiliation": "Department of Biological Chemistry, David Geffen School of Medicine at UCLA, Los Angeles, CA 90095, USA.",
                "email": null
              }
            ],
            "CollectiveName": null,
            "ForeName": "James",
            "Identifier": null,
            "Initials": "J",
            "LastName": "Wohlschlegel"
          },
          {
            "AffiliationInfo": [
              {
                "Affiliation": "Molecular Cell Biology and Genetics, Max Planck Institute, 01307 Dresden, Germany.",
                "email": null
              }
            ],
            "CollectiveName": null,
            "ForeName": "Mihail",
            "Identifier": null,
            "Initials": "M",
            "LastName": "Sarov"
          },
          {
            "AffiliationInfo": [
              {
                "Affiliation": "Gurdon Institute, University of Cambridge, Cambridge CB2 1QN, UK.",
                "email": null
              }
            ],
            "CollectiveName": null,
            "ForeName": "Eric A",
            "Identifier": null,
            "Initials": "EA",
            "LastName": "Miska"
          },
          {
            "AffiliationInfo": [
              {
                "Affiliation": "Department of Biochemistry & Goodman Cancer Research Centre, McGill University, Montréal, QC H3A 1A3, Canada. Electronic address: thomas.duchaine@mcgill.ca.",
                "email": [
                  "thomas.duchaine@mcgill.ca"
                ]
              }
            ],
            "CollectiveName": null,
            "ForeName": "Thomas F",
            "Identifier": null,
            "Initials": "TF",
            "LastName": "Duchaine"
          }
        ],
        "Journal": {
          "ISOAbbreviation": "Mol. Cell",
          "ISSN": "1097-4164",
          "JournalIssue": {
            "Issue": null,
            "PubDate": {
              "Day": "17",
              "Month": "Apr",
              "Year": "2020"
            },
            "Volume": null
          },
          "Title": "Molecular cell"
        }
      },
      "ChemicalList": null,
      "InvestigatorList": null,
      "KeywordList": [
        "22G-RNA",
        "Argonaute",
        "NRDE-3",
        "RNA interference",
        "RNAi",
        "piRNA",
        "trans-generational epigenetic inheritance"
      ],
      "MeshheadingList": null
    },
    "PubmedData": {
      "ArticleIdList": [
        {
          "IdType": "pubmed",
          "id": "32348780"
        },
        {
          "IdType": "doi",
          "id": "10.1016/j.molcel.2020.04.007"
        }
      ],
      "History": [
        {
          "PubMedPubDate": {
            "Day": "24",
            "Month": "06",
            "Year": "2019"
          },
          "PubStatus": "received"
        },
        {
          "PubMedPubDate": {
            "Day": "19",
            "Month": "02",
            "Year": "2020"
          },
          "PubStatus": "revised"
        },
        {
          "PubMedPubDate": {
            "Day": "06",
            "Month": "04",
            "Year": "2020"
          },
          "PubStatus": "accepted"
        },
        {
          "PubMedPubDate": {
            "Day": "30",
            "Month": "4",
            "Year": "2020"
          },
          "PubStatus": "entrez"
        },
        {
          "PubMedPubDate": {
            "Day": "30",
            "Month": "4",
            "Year": "2020"
          },
          "PubStatus": "pubmed"
        },
        {
          "PubMedPubDate": {
            "Day": "30",
            "Month": "4",
            "Year": "2020"
          },
          "PubStatus": "medline"
        }
      ],
      "ReferenceList": null
    }
  },
  "createdDate": "2020-05-06T17:57:40.357Z",
  "status": "trashed",
  "verified": true
}
jvwong commented 3 years ago

document status is either SUBMITTED, PUBLISHED.

Somewhat related to #821

maxkfranz commented 3 years ago

Addressed by PR #831