Closed AdamOpps closed 4 months ago
Hey @AdamOpps , this is great, thanks for investigating. Do you have any of this implemented in code? I'll look through my internal nda/ndax files and see if this additional data shows up in the same places.
Yeah, I have written some very simple helper functions that will search for the nda version, and then use that to find the mass or remarks. Would you like me to submit a PR into the dev branch, or submit it someplace else?
I started looking for active mass in some Neware 4000 files, and I think the number should be a 4-byte unsigned int. When I read 8 bytes, some of the files returned nonsense values. Can you check if 4 bytes still gives correct values for your files?
NewareNDA already returns some additional information via logging , so I'm thinking it's best to return active mass this way too. Otherwise there will need to be a new keyword argument, and I don't really want to get into post-processing.
4-byte unsigned integer is what I meant. Thanks for the clarification! I think I misread ImHex's data identifier and that Is what I quoted above.
I think putting the Active Material and Remarks into the log would be appropriate! I would advocate, however, for including columns for Specific Capacity directly into the returned dataframe from the read functions. If mass isn't specified, we could fill the column with zeros or NaNs or something. I doubt it would break compatibility for users with this solution. Let me know your thoughts though!
I created a new PR #59 for extracting active mass as logging information. I prefer this approach since many of my test files have an active mass of zero. For Neware 9000 I just looked for the mass bytes from the end of the file, but I only have 3 files to test. Can you check if this works on more files?
I gave it some tests, check out my comments! #59
I have delved into the depths of binary data extraction and have successfully found Active Mass, Remarks, and some other interesting data points for .nda files generated by Neware 4000 and 9000 instruments. I do not have access to .ndax files, so I cannot provide any information about those unfortunately.
NEWARE 4000 (and perhaps other machines)
The entered "Active Material" value exists at byte 0x00000098 (decimal: 152).
It appears that the data is stored in LE as a 4-byte uint (BTS Software does not let you enter a negative number.)
The value decoded is actually the ActiveMaterial*1000 (in other words, to extract active material, divide this decoded value by 1000)
When specifying Active Material in BTS Software, the default is mg, but there are other options for units, including some volumetric types. I do not know how changing units affects this value.
The "Remarks" section exists at byte 0x0000090D (dec: 2317)
It is simply an ASCII/UTF-8 encoded string.
I suspect that it is 100 Bytes long, but this is untested.
NEWARE 9000
The same metadata type information is stored in the .nda files, but as a footer to the data, unlike the 4000 case.
I have identified the footer as beginning as a byte string of the following:
06 00 f0 1d 81 00 03 00 61 90 71 90 02 7f ff 00
This is used to signify the beginning to the header where our relevant metadata is stored.
I have seen this exact string in a few of my datafiles, but it is very possible that at least part of this string is not static, I just have not seen that case yet.
The first 107 bytes after the signifier are not decoded yet (and I have no plans to do so at the moment.)
The next 128 bytes are ASCII text for the "Creator" or "Operator"
The next 128 bytes are ASCII text for the "P/N"
The next 128 bytes are ASCII text for the "Remark Information" or "Comments"
The next 8 bytes are LE encoded double float for the "Active Material"
There is more in the footer following this, but it does not seem important to me at the moment. There is another big chunk of ASCII, so feel free to investigate at your interest.
Table of 9k Footer Storage Structure:
Bytes after signifier |dtype | data 1-107 |? | Unknown 108-235|ASCII | Creator (Operator) 236-363|ASCII | P/n 364-491|ASCII | Remarks (Comments) 492-499|double| Active Material
I leave implementation details to the more experienced among us. I can certainly take a stab at adding some extraction code, as I plan on doing so for my personal work in the meantime. However, I don't know whether any of these extractions should be added to the main "read()" functions, or if they should be their own methods.
I can imagine extracting the mass and adding dedicated "Specific Capacity" Columns to the main read_nda() function. It is also possible to have some separate function along the lines of a "read_metadata()" function that returns a simple dictionary with Remarks, Active Material, P/N, Run Date, etc. as we find them. Either implementation would be valuable in my opinion.
There are a few other data structures that I have found in the Neware 4000 file format. In short, they aren't useful, so I wont provide too much detail. One instance is a data structure that extracts with the same format as the main data points, but it has a different identifier ('\xAA' instead of '\x55'). These data blocks appear to be the last data point of each step. Within these data blocks, there are 8 bytes of unidentified data that appear to be values for Differential Capacity.