Closed quantbabies closed 9 years ago
Are the files compressed? Currently we don't support BINARY compression.
Here's the header from the python package sas7bdat
col_count_p1: 10
col_count_p2: 0
column_count: 10
compression: None
creator: None
creator_proc: CONNECT
date_created: 2009-10-10 13:16:16.041934
date_modified: 2009-10-10 13:16:16.041934
endianess: little
file_type: DATA
filename: somedata.sas7bdat
header_length: 8192
lcp: 7
lcs: 0
mix_page_row_count: 87
name: somedata.sas7bdat
os_name: x86_64
os_type: 2.6.18-92.1.22.e
page_count: 2987773
page_length: 8192
platform: unix
row_count: 433226881
row_length: 56
sas_release: 9.0201M0
server_type: Linux
u64: True
Do you have a (smaller) file you can send me that demonstrates the problem? Since it's a general parse error I have no idea what the problem might be. (Use readstat_error_message
to convert the error code to a string.)
I wish I could share. If you are interested, I have dug in a bit. I apologize for no line numbers, I've made a few changes to the code.
One file reads fine until near the end of the file at which point it sets retval = READSTAT_ERROR_PARSE just after the declaration of sas_parse_catalog_page near line 700 readstat_sas.c
The second file doesn't read at all. It sets retval = READSTAT_ERROR_PARSE near like 860 of readstat_sas.c
if (len > 0 && compression != SAS_COMPRESSION_TRUNC) {
if (offset > page_size || offset + len > page_size ||
offset < off+24+subheader_count*lshp) {
retval = READSTAT_ERROR_PARSE;
printf("Foobar 19");
/*
goto cleanup;
*/
At this point compression = 64, but the file is not compressed (at least according to the python package, which does not call either of its decompression algorithms when reading the file).
Of course I get a seg fault when I try to let it run through here.
I'm wondering if somewhere one of the int_16's is wrapping around.
I received similar reports from other users and was able to track down the issue. An integer overflow was producing a negative offset in the file, which caused things to go haywire at around the 2GB mark. This commit ought to fix it:
https://github.com/WizardMac/ReadStat/commit/1bae419a277c5f81aa84427e68a17b406a62d263
Are you using the latest code? That code snippet looks out of date. I've made additional fixes in the last month which may resolve the issue for you.
The library gives an error 5 for certain (very large) sas7bdat files. I haven't been able to pin down the problem myself. Files can be read with https://pypi.python.org/pypi/sas7bdat
Any tips on the kinds of issues I should be thinking about?