Closed AlasdairGray closed 3 years ago
Current response over IDP-KG (Full) based on commit ac30059046932f00c94f9f301d0f08d29a89143d
x | description | count |
---|---|---|
1 | Distinct Proteins (Union) | "2289"^^xsd:integer |
2 | DisProt Proteins | "1615"^^xsd:integer |
3 | MobiDB Proteins | "2073"^^xsd:integer |
4 | PED Proteins | "83"^^xsd:integer |
5 | DisProt \ (MobiDB U PED) | "179"^^xsd:integer |
6 | MobiDB \ (DisProt U PED) | "637"^^xsd:integer |
7 | PED \ (DisProt U MobiDB) | "33"^^xsd:integer |
8 | (DisProt U MobiDB) | "2256"^^xsd:integer |
9 | (DisProt U PED) | "1652"^^xsd:integer |
10 | (MobiDB U PED) | "2110"^^xsd:integer |
11 | DisProt n MobiDB | "1432"^^xsd:integer |
12 | DisProt n PED | "46"^^xsd:integer |
13 | MobiDB n PED | "46"^^xsd:integer |
14 | (DisProt n MobiDB) \ PED | "1390"^^xsd:integer |
15 | (DisProt n PED) \ MobiDB | "4"^^xsd:integer |
16 | (MobiDB n PED)\DisProt | "4"^^xsd:integer |
17 | DisProt n MobiDB n PED | "42"^^xsd:integer |
Running query over full scrape dataset in commit 4b4c7ae5db27e2bc39267140080f55c3b8ebb9de
x | description | count |
---|---|---|
1 | Distinct Proteins (Union) | "2718"^^xsd:integer |
2 | DisProt Proteins | "2062"^^xsd:integer |
3 | MobiDB Proteins | "2073"^^xsd:integer |
4 | PED Proteins | "90"^^xsd:integer |
5 | DisProt \ (MobiDB U PED) | "604"^^xsd:integer |
6 | MobiDB \ (DisProt U PED) | "617"^^xsd:integer |
7 | PED \ (DisProt U MobiDB) | "34"^^xsd:integer |
8 | (DisProt U MobiDB) | "2684"^^xsd:integer |
9 | (DisProt U PED) | "2101"^^xsd:integer |
10 | (MobiDB U PED) | "2114"^^xsd:integer |
11 | DisProt n MobiDB | "1451"^^xsd:integer |
12 | DisProt n PED | "51"^^xsd:integer |
13 | MobiDB n PED | "49"^^xsd:integer |
14 | (DisProt n MobiDB) \ PED | "1407"^^xsd:integer |
15 | (DisProt n PED) \ MobiDB | "7"^^xsd:integer |
16 | (MobiDB n PED)\DisProt | "5"^^xsd:integer |
17 | DisProt n MobiDB n PED | "44"^^xsd:integer |
Analysis now taking place in GSheet.
Found that there are 147 deprecated proteins in DisProt, and one MobiDB entry that did not scrape properly.
2021-09-28 version is now in sync with what is available on the websites of the data sources. Problems were due to using named graphs as proxies for pages and deprecated proteins being included in the sitemaps.
Message received from Ivan with details of overlap between the three datasets (summarised in attached figure):