BlueBrain / nexus

Blue Brain Nexus - A knowledge graph for data-driven science
https://bluebrainnexus.io/
Apache License 2.0
277 stars 74 forks source link

Delta Requirements for 1.9 #3904

Open bilalesi opened 1 year ago

bilalesi commented 1 year ago
  1. In the search pages, the listing of resources with distribution has no contentUrl

    • Allow the payload to have the resourceId and contentUrl of distribution binaries
      • Current measures: current-distribution-fetch.png
  2. ACLs for archives/write must be allowed for all users in all org/projects so the user can download the archive

  1. Resources (files and distribution binaries) are copied from production so when the user select them, it won’t download due this error UnknownAccessTokenIssuer:

download from prod in dev.gif

  1. File name is limited in the path property of the payload PUT: /archives endpoint.

  2. My Data page

    • Resources must be for the logged in user not the global resources: In the my data page, the fetched resources meant to be for the logged in user, but instead we have all the resources in delta.
  3. Resources & Studios

    • Resources sorting by label not working, by passing (-)label this due there is multiple naming in delta (label, labelName) and many formats (array, string) 🤔
  4. The download limits should be investigated - the concrete usecase we have at hand is that set of 200 morphologies (maybe 100-200MB in total).

  5. Studio improvement - get batch information for each row 3892

  6. File name and extension is missing in many studio resources: 3899

  7. Sometimes resources.self is an array or undefined - 3901

  8. Bug - Resources should not have multiple projects - 3990 (Move to backend)

  9. Sometimes distribution.label and contentType is an array

imsdu commented 1 year ago
  1. It is not an easy thing as:
    • We are not the ones responsible for how distributions are modeled
    • The download url is the _self and it may contains the rev or a tag
    • They may point to another project
    • The link may be curied
    • The links may be broken
    • They could be external ? What we could do is create a endpoint that would allow to parse a list of potential _self to extract the project / the id / the rev / the tag. The result would be a list of:
    • _self properly parsed
    • _self that could not be parsed (external link, unknown project, ...) This could be used to create archives but also to validate that the links of a contribution is valid
  2. For archives acls:
    • Add archives/write for anonymous on all public projects
    • Add archives/write for bbp users
  3. Local user is not known to prod, we can extend acls in dev though
  4. We can't really do something about that while archives are relying on the tar format, the path can be overrided with a fallback
  5. We should be able to list resources created by the user and last updated by the users
    • We can't for now list resources created by the user OR last updated by the users
    • We can't list the resources where the users contributed to an intermediary revision
  6. If possible as the number of studios is low, it is possible to do it on the frontend side. We could also improve the default elasticsearch index to make it possible for all resources (so including studios) if there is a larger need
  7. Whenever the fix on archives is implemented, it should be ok.
  8. An example ?
  9. To see with DKE what would be the best approach ? Provide a fallback on Nexus file maybe ?
  10. We should provide a way to block the creation of a resource with a property starting with a _ ?
  11. Same
  12. An example ?
Dinika commented 1 year ago

For point 8 (batching response for studios), here's an example.

Consider this studio in staging.

For the row in the table with name "PATO - the Phenotype And Trait Ontology", I only get the following information as response for the sparql query:

{
    "label": {
        "type": "literal",
        "value": "PATO - the Phenotype And Trait Ontology"
    },
    "self": {
        "type": "uri",
        "value": "https://staging.nise.bbp.epfl.ch/nexus/v1/resources/neurosciencegraph/datamodels/ontologies/obo:pato.owl"
    }
}

Since the above does not contain properties like distribution, project, createdAt etc (which are needed to show user some data like file size in the download-panel), I need to make another request to the resource endpoint (https://staging.nise.bbp.epfl.ch/nexus/v1/resources/neurosciencegraph/datamodels/ontologies/obo:pato.owl), which gives me the following response (note that the frontend does not need a lot of things sent back in this response, like the defines, context, versionInfo):

{
    "@context": [
        "https://bluebrain.github.io/nexus/contexts/metadata.json",
        "https://neuroshapes.org"
    ],
    "@id": "obo:pato.owl",
    "@type": "Ontology",
    "defines": [
        {
            "@id": "obo:PATO_0000001",
            "@type": "Class",
            "atlasRelease": {
                "@id": "https://bbp.epfl.ch/neurosciencegraph/data/brainatlasrelease/c96c71a8-4c0d-4bc1-8a1a-141d9ed6693d",
                "_rev": 9
            },
            "http://www.geneontology.org/formats/oboInOwl#hasAlternativeId": "PATO:0000072",
            "http://www.geneontology.org/formats/oboInOwl#hasOBONamespace": "quality",
            "http://www.geneontology.org/formats/oboInOwl#id": "PATO:0000001",
            "label": "quality",
            "obo:IAO_0000115": "A dependent entity that inheres in a bearer by virtue of how the bearer is related to other entities",
            "obo:IAO_0000589": "quality (PATO)"
        },
       // Redacted - there are ~35 definitions here.
     ],
    "description": "An ontology of phenotypic qualities (properties, attributes or characteristics).",
    "distribution": [
        {
            "@type": "DataDownload",
            "atLocation": {
                "@type": "Location",
                "store": {
                    "@id": "nxv:diskStorageDefault"
                }
            },
            "contentSize": {
                "unitCode": "bytes",
                "value": 147611
            },
            "contentUrl": "https://staging.nise.bbp.epfl.ch/nexus/v1/files/neurosciencegraph/datamodels/40641598-0eff-40e9-9e55-7cea05e37ac9",
            "digest": {
                "algorithm": "SHA-256",
                "value": "148d4584e1373d90f1298492133135bb988ca9399d972025bdf58be140239aae"
            },
            "encodingFormat": "text/turtle",
            "name": "pato.ttl"
        },
        {
            "@type": "DataDownload",
            "atLocation": {
                "@type": "Location",
                "store": {
                    "@id": "nxv:diskStorageDefault"
                }
            },
            "contentSize": {
                "unitCode": "bytes",
                "value": 28804
            },
            "contentUrl": "https://staging.nise.bbp.epfl.ch/nexus/v1/files/neurosciencegraph/datamodels/5164951f-4f2b-4672-8003-29c256ca89db",
            "digest": {
                "algorithm": "SHA-256",
                "value": "b0bd577e28c5b3542d8bcc7d7402d7fbe76f2a31443dcefe502ffd2205c7dfcc"
            },
            "encodingFormat": "application/ld+json",
            "name": "pato.json"
        },
        {
            "@type": "DataDownload",
            "atLocation": {
                "@type": "Location",
                "store": {
                    "@id": "nxv:diskStorageDefault"
                }
            },
            "contentSize": {
                "unitCode": "bytes",
                "value": 5219
            },
            "contentUrl": "https://staging.nise.bbp.epfl.ch/nexus/v1/files/neurosciencegraph/datamodels/a8a6f746-6a20-4f78-8234-ca729b0008ad",
            "digest": {
                "algorithm": "SHA-256",
                "value": "f7f4dd9b640c7d0c326b556c45d1d298e26ce39a6e7f3f9aaa1ee9cd7b5e60b1"
            },
            "encodingFormat": "text/csv",
            "name": "pato.csv"
        }
    ],
    "http://www.geneontology.org/formats/oboInOwl#default-namespace": "quality",
    "http://www.geneontology.org/formats/oboInOwl#hasOBOFormatVersion": "1.2",
    "label": "PATO - the Phenotype And Trait Ontology",
    "owl:versionIRI": {
        "@id": "obo:pato/releases/2020-02-09/pato.owl"
    },
    "prefLabel": "PATO - the Phenotype And Trait Ontology",
    "title": "PATO - the Phenotype And Trait Ontology",
    "versionInfo": "R63",
    "_constrainedBy": "https://neuroshapes.org/dash/ontology",
    "_createdAt": "2022-05-27T07:39:25.590Z",
    "_createdBy": "https://staging.nise.bbp.epfl.ch/nexus/v1/realms/serviceaccounts/users/service-account-brain-modeling-ontology-ci-cd",
    "_deprecated": false,
    "_incoming": "https://staging.nise.bbp.epfl.ch/nexus/v1/resources/neurosciencegraph/datamodels/ontologies/obo:pato.owl/incoming",
    "_outgoing": "https://staging.nise.bbp.epfl.ch/nexus/v1/resources/neurosciencegraph/datamodels/ontologies/obo:pato.owl/outgoing",
    "_project": "https://staging.nise.bbp.epfl.ch/nexus/v1/projects/neurosciencegraph/datamodels",
    "_rev": 46,
    "_schemaProject": "https://staging.nise.bbp.epfl.ch/nexus/v1/projects/neurosciencegraph/datamodels",
    "_self": "https://staging.nise.bbp.epfl.ch/nexus/v1/resources/neurosciencegraph/datamodels/ontologies/obo:pato.owl",
    "_updatedAt": "2023-05-30T15:26:05.292Z",
    "_updatedBy": "https://staging.nise.bbp.epfl.ch/nexus/v1/realms/serviceaccounts/users/service-account-brain-modeling-ontology-ci-cd"
}

Since the @id field received from this response is a curie, another request (this time with query param format=expanded) needs to be done to retrieve the correct uri (http://purl.obolibrary.org/obo/pato.owl).

The above 2 requests need to be made for every row that the user selects.