Open shezhou opened 3 months ago
hi, thanks for the report. Please provide the data to reproduce the problem as indicated in the template.
hi, thanks for the report. Please provide the data to reproduce the problem as indicated in the template.
Thank you for your reply, this is data . You can ignore dtype=dtype_dict in df = pd.read_json(arg1, dtype=dtype_dict, lines=True) PRI_Basic.json
in order to make the issue reproducible, please provide dtype_dict ( and any other information necessary to reproduce)
OK, it seems that the issue is that you have one specific row with a very long string (of length 1988). Right now pyreadstat is writing it as dta type str which max length is 2045 bytes (that means ~1020 python characters). It seems that there is a way to write the newer strL type that can have much longer strings (see here), I can see if I can implement that in the future. For now the solution is to avoid writing such long strings, you could for example split them in multiple columns.
arg1 = 'E:\test\single\dta\PRI_Basic.json' arg3 = 'E:\test\single\dta\PRI_Basic.dta' df = pd.read_json(arg1, dtype=dtype_dict, lines=True) pyreadstat.write_dta(df, arg3) The following error occurred: yreadstat._readstat_parser.ReadstatError: A provided string value was longer than the available storage size of the specified column View history lssues It seems to only solve the SAV format,