Closed JanMarvin closed 2 years ago
Research indicates, that the information is stored on a page 640 after the data information. In a synthetic test file (similar to data.frame(x = 1:3)
) the following values were found. The comment following the file name is the value, the comment below the function call is the sas call. The test files were created using a x64 SAS. If the hex value is changed to a different value, the SAS output changes. E.g. change 0x40
to 0x80
in test2
and the result will be test3
. Not sure, how the values are constructed.
fl <- "../sas7bdat/test2.sas7bdat" # 64
dd <- read.sas(fl, F) # delete x = 2
fl <- "../sas7bdat/test3.sas7bdat" # 96
dd <- read.sas(fl, F) # delete x > 1
fl <- "../sas7bdat/test4.sas7bdat" # 128
dd <- read.sas(fl, F) # delte x = 1
fl <- "../sas7bdat/test5.sas7bdat" # 192
dd <- read.sas(fl, F) # delete x < 3
The value appears to be a double. Either it is the number of the row to be deleted (starting at 0) (e.g., 2) or negative indicating the number of rows to be deleted from the top (e.g., -0 or -2)?
Found another PAGE_TYPE 384, here the last double of the page seems to indicate which of the rows has to be removed. Still the position of the double on PAGE_TYPE 640 remains unknown
Certain sas7bdat files contain rows deleted by the SAS user prior to writing the file. These rows are usually in the middle or at the end of the file. Presumably SAS is lazy removing and repositioning data in output files and instead simply notes lines to be ignored.
Right now these lines are imported by
readsas
. Therefore the dataset might differ from SAS. The information which row(s) to ignore is assumed to be at the end of case 1.