Closed varioustoxins closed 8 months ago
If you're just reading a file looking for specific tag, SAS has a separate parser for this: one that returns tag-value pairs.
I think if you wanted to Do It Right(tm) you'd add a Row
class to the loop -- like sqlite3
does it: optional via row_factory
. Jon?
? If you're just reading a file looking for specific tag, SAS has a separate parser for this ? - not quite sure I follow the sas reference
Here is something which exists already and I believe meets your needs:
>>> a = pynmrstar.Entry.from_database(15000)
>>> l = a[0][0]
>>> l.get_tag(['ordinal', 'family_name'], dict_result=True)
[{'Ordinal': '1', 'Family_name': 'Cornilescu'}, {'Ordinal': '2', 'Family_name': 'Cornilescu'}, {'Ordinal': '3', 'Family_name': 'Hadley'}, {'Ordinal': '4', 'Family_name': 'Gellman'}, {'Ordinal': '5', 'Family_name': 'Markley'}]
The only difference being that you need to specify which tags you want. The presumption here is that you don't need the dictionary to contain values your code doesn't know how to handle - since presumably they would just be ignored anyway. It also would raise an exception if you ask for a tag that isn't present, which would make it clear where an issue is right away, versus getting a generator of values and only realizing later that a particular key is missing with an IndexError.
If you really do need all the tags in the dictionary though, you could do the following:
l.get_tag(l.tags, dict_result=True)
Though that is admittedly a little clunky looking, it will work fine.
? If you're just reading a file looking for specific tag, SAS has a separate parser for this ? - not quite sure I follow the sas reference
https://github.com/bmrb-io/SAS -- there is a python3 branch that passes basic tests.
Here is something which exists already and I believe meets your needs:
Though looking at this further, it really should return the tag names with the same capitalization you use to query them. Otherwise if your specified capitalization doesn't match the file, you'll run into an annoying discrepancy. I'll look into updating this code.
Does this meet your needs? If not, if you describe exactly what sort of operation you're performing on the loop as you iterate through the rows, I may be able to provide an idiomatic way to do it.
One other thing I realized I didn't mention on this issue before is that Loop.get_tag()
can take None as the list of tags which means "all tags", so you can get a list of dictionaries for the loop tags via
Loop.get_tag(dict_result=True)
My version is a bit friendlier for me as I have the basic conversions I need built in(str->int str->float) and can do a row at a time rather than slurp the whole lot (though who cares these days, SOOOO much memory). Forgotten I had sent my jiffy in before and #124 uses it...
nb one other question if I want to build my own schema for validation how do I do that as PyNMRStar doesn't read mmcic dics. Is there another tool I need (app to open a separate issue)
regards Gary
On 3/6/24 11:34, varioustoxins wrote:
nb one other question if I want to build my own schema for validation how do I do that as PyNMRStar doesn't read mmcic dics. Is there another tool I need (app to open a separate issue)
https://github.com/bmrb-io/SAS has both mmcif and "ddl" (for pdbx .dic file) parsers. You'll likely need to hack the ddl one to work with your file, but it is relatively straightforward.
Dimitri
@varioustoxins - To be honest, while I had written the code to support different versions of the BMRB schema, I hadn't put much work into generic schema handling as it wasn't relevant. To wit, PyNMR-STAR loads a CSV used internally to generate the BMRB DDL rather than the BMRB DDL.
I just made a new release to improve the support of other schemas. There is still a caveat that you'll need to write it in CSV format rather than DDL, but I have attached an example here showing how straightforward it would be to convert your schema into CSV to use with PyNMR-STAR.
With 3.3.4:
Example loop file:
loop_
_Test.Ordinal
_Test.Name
_Test.Value
_Test.Description
1 first_thing 1.2 'something very important'
2 second_thing 1.99 'ignore this'
3 way_too_long_of_name 3 'cannot be this long'
stop_
Example dictionary:
Dictionary sequence,Tag,Data Type,BMRB data type,Loopflag,Nullable,public,SFCategory,ADIT category view type
TBL_BEGIN,,,,,,,,v.1
10,_Test.Ordinal,INTEGER,int,Y,,Y,_Test,
20,_Test.Name,VARCHAR(12),code,Y,NOT NULL,Y,_Test,
30,_Test.Value,FLOAT,float,Y,NOT NULL,Y,_Test,
40,_Test.Description,TEXT,text,Y,,Y,_Test,
50,_Test.Verified,CHAR(3),yes_no,Y,NOT NULL,Y,_Test,
60,_Test.Internal,TEXT,line,Y,,I,_Test,
TBL_END,,,,,,,,
>>> import pynmrstar
>>> s = pynmrstar.Schema('schema.csv')
>>> l = pynmrstar.Loop.from_file('example.test', schema=s, convert_data_types=True)
>>> l.data
[[1, 'first_thing', Decimal('1.2'), 'something very important'], [2, 'second_thing', Decimal('1.99'), 'ignore this'], [3, 'way_too_long_of_name', Decimal('3'), 'cannot be this long']]
>>> print(s)
BMRB schema from: 'schema.csv' version 'v.1'
Tag_Prefix Tag Type Null_Allowed SF_Category
_Test
Ordinal INTEGER True _Test
Name VARCHAR(12) False _Test
Value FLOAT False _Test
Description TEXT True _Test
Verified CHAR(3) False _Test
Internal TEXT True _Test
>>> l.validate(schema=s)
["Length of '20' is too long for 'VARCHAR(12)': '_Test.Name':'way_too_long_of_name'."]
The example shows not just that you can validate using the specified dictionary, but if you use it in combination with the convert_data_types=True
argument when parsing an Entry/Saveframe/Loop the data types are also converted automatically, according to the specified schema. That functionality has been present for a long time, but 3.3.4 lets you use a custom schema when parsing which wasn't previously supported.
schema.csv
example.txt
Hi John
Thank you so much for the reply and sorry about the slow reply / comments
Comments below and one more question
Where can I get support on using the BMRB web api, I had some questions…
For example can I list all shift lists that contain CA C N CB* shifts without also downloading all the data
Regards Gary
Dr Gary S Thompson NMR Facility Manager CCPN CoI & Working Group Member Wellcome Trust Biomolecular NMR Facility School of Biosciences, Division of Natural Sciences University of Kent, Canterbury, Kent, England, CT2 7NZ
☎:01227 82 7117 ✉️: @.*** orchid: orcid.org/0000-0001-9399-7636
On 13 Mar 2024, at 20:06, Jon Wedell @.***> wrote:
You don't often get email from @.*** Learn why this is importanthttps://aka.ms/LearnAboutSenderIdentification
CAUTION: This email originated from outside of the organisation. Do not click links or open attachments unless you recognise the sender and know the content is safe.
@varioustoxinshttps://github.com/varioustoxins - To be honest, while I had written the code to support different versions of the BMRB schema, I hadn't put much work into generic schema handling as it wasn't relevant. To wit, PyNMR-STAR loads a CSV used internally to generate the BMRB DDL rather than the BMRB DDL.
;-)
I just made a new release to improve the support of other schemas. There is still a caveat that you'll need to write it in CSV format rather than DDL, but I have attached an example here showing how straightforward it would be to convert your schema into CSV to use with PyNMR-STAR.
Thank you!
With 3.3.4:
Example loop file:
loop_ _Test.Ordinal _Test.Name _Test.Value _Test.Description
1 first_thing 1.2 'something very important' 2 second_thing 1.99 'ignore this' 3 way_too_long_ofname 3 'cannot be this long' stop
Example dictionary:
Dictionary sequence,Tag,Data Type,BMRB data type,Loopflag,Nullable,public,SFCategory,ADIT category view type TBL_BEGIN,,,,,,,,v.1 10,_Test.Ordinal,INTEGER,int,Y,,Y,_Test, 20,_Test.Name,VARCHAR(12),code,Y,NOT NULL,Y,_Test, 30,_Test.Value,FLOAT,float,Y,NOT NULL,Y,_Test, 40,_Test.Description,TEXT,text,Y,,Y,_Test, 50,_Test.Verified,CHAR(3),yes_no,Y,NOT NULL,Y,_Test, 60,_Test.Internal,TEXT,line,Y,,I,_Test, TBL_END,,,,,,,,
Ok lost of questions here!
import pynmrstar s = pynmrstar.Schema('schema.csv') l = pynmrstar.Loop.from_file('example.test', schema=s, convert_data_types=True) l.data [[1, 'first_thing', Decimal('1.2'), 'something very important'], [2, 'second_thing', Decimal('1.99'), 'ignore this'], [3, 'way_too_long_of_name', Decimal('3'), 'cannot be this long']] print(s) BMRB schema from: 'schema.csv' version 'v.1'
Tag_Prefix Tag Type Null_Allowed SF_Category
_Test Ordinal INTEGER True _Test Name VARCHAR(12) False _Test Value FLOAT False _Test Description TEXT True _Test Verified CHAR(3) False _Test Internal TEXT True _Test
l.validate(schema=s) ["Length of '20' is too long for 'VARCHAR(12)': '_Test.Name':'way_too_long_of_name'."]
The example shows not just that you can validate using the specified dictionary, but if you use it in combination with the convert_data_types=True argument when parsing an Entry/Saveframe/Loop the data types are also converted automatically, according to the specified schema. That functionality has been present for a long time, but 3.3.4 lets you use a custom schema when parsing which wasn't previously supported. schema.csvhttps://github.com/bmrb-io/PyNMRSTAR/files/14593398/schema.csv example.txthttps://github.com/bmrb-io/PyNMRSTAR/files/14593399/example.txt
— Reply to this email directly, view it on GitHubhttps://github.com/bmrb-io/PyNMRSTAR/issues/111#issuecomment-1995616444, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AA3UD6K76GYOZITVU3P5IBLYYCWSVAVCNFSM56MQZOW2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOJZGU3DCNRUGQ2A. You are receiving this because you were mentioned.Message ID: @.***>
Yes.
Exactly. The number is completely arbitrary, it just must increase from row to row. It can change from version to version without consequence.
The supported Data Type
s: https://github.com/bmrb-io/PyNMRSTAR/blob/c84160cf024aeabeafa77394a8d629b620341d2d/pynmrstar/schema.py#L111
The BMRB Data Type
: https://github.com/bmrb-io/PyNMRSTAR/blob/v3/pynmrstar/reference_files/data_types.csv
Y for tags that are part of a loop, N for tags that are part of a saveframe.
Indeed.
This is mainly internal - there are non-public tags that are stripped when NMR-STAR files are released. For your use case, it probably makes sense to set every tag to Y.
The rows can have null in the ADIT category column
but you should keep the v.1
(or something, whatever you want) in the second row, this is required.
For your question about accessing chemical shift data, please send me an e-mail and I'll be happy to follow up with more information. I'd prefer to keep GitHub issues for bugs/feature requests.
Cheers, Jon
I have this function which makes working with rows in loops much easier, would its be possible to add it as a method on loop, or am I missing the right idiom for dealing with loops?
to be clear I was doing things like
which seems clunky