Closed tomgoddard closed 1 year ago
Thanks for flagging that broken link, it's fixed now. cc @ebetica for fixing the stats.parquet file, this one is based on an old early version of the database before filling in missing entries and dedups
Do you have an idea when the stats.parquet file that corresponds the current online ESM database will be available?
Since Zeming is out of office, I took this over. The stats file is now updated with the complete, non-redundant set of keys:
617051007
3948a44562b6bd4c184167465eec17de
Just in case anyone would want to reference it, the old file is copied to stats.old_bk.parquet
under the same basepath.
Let me know if you encounter any further issues, closing this in the meantime. Will resolve #366 as well based on this file. Thank you again for flagging those issues!!
Thanks! My main interest in stats.parquet was to get the list of MGnify identifiers for the database so that I could create a file of all the sequences for searching. You provided the sequence file in #366 which has solved that problem. But I may still uses stats.parquet to do filtering my model scores.
Ah yes happy we have the right data in place now! :)
Here is an example of the duplicate entries in the stats.parquet file with the same MGnify id. The are the same except for differing ptm values. It seems there should not be duplicates since I believe each MGnify id has only one structure prediction in the atlas.
This is the stats.parquet file I used is
The link to the stats.parquet file
on the Atlas API web page
is broken, gives an Access Denied error.