Closed ningwei-wei closed 2 years ago
As far as I can make out, the file is reading correctly. data.table
is simply telling you that the file ends in an unusual way, and that it takes time to identify this irregularity and fix it. Without the original file, all that I can say is that the file should end with a newline i.e. the character you get when you hit enter to start a new line (that is not printed/visible).
I doubt there is anything for you to fix here, unless you are getting something unexpected.
I think answer above is pretty well answering the question. If you have more questions about this particular issue let us know, we can always reopen issue if there will be something to improve.
Apologies for reviving this old thread but I've also encountered this a few times now and very weirdly it seems to only be occurring after compressing a file.
I was able to fread my file perfectly fine beforehand but following compressing the file with gzip it now always throws this warning.
As you say, I don't think its necessarily a problem, but it is a bit concerning to see the warning every time I read the file.
@Sabor117 How do decompress your file? Do you that on yourself or do you let data.table
handle the decompressing? Which data.table
version are you using? If data.table
is handling the decompression we can might fix this if you provide us with a reprex.
I just use fread()
like this:
ukbb_scores_1 = fread("st03_01_scores.eqtlgen_ukb_prscs_ukb.tsv", data.table = FALSE)
And this is using data.table_1.14.6
. Unfortunately I'm not sure how I can really provide anything reproducible for this as the data file in question is massive (16GB uncompressed) and also not something I can share anyway.
Without a reproducible example, it's really hard to give suggestions. In addition - you've shared a snippet loads a .tsv
file and not a compressed one; we might be able to help if you provide/show that code also.
sry to reviving this old thread again!I encounter the same error report when read in a compressed file, and cannot read even I decompressed it.
For completeness, I've lived exactly the same problem with big compress files. The problem was that the decompression process saves a file in the /tmp
folder of the os (in my case, Ubuntu) and it was too small to handle all the data, stopping halfway through. Increase the size of your /tmp
could fix your problem.
For that I used:
sudo mount -o remount,exec,size=40gb /tmp
but it may not be appropriate in your specific case, please check the proper command yourself.
The problem I think is that the error message is note very related to the problem on hand. Troubleshooting this bug would benefit from a more appropriate error message.
@BastienFR
Thanks. As usual it would be nice to post the output of using options(datatable.verbose=TRUE)
. This makes it finding the correct error message a lot easier
@ben-schwen
I've rerun my things to get the required information. Hopefully it's satisfactory. Please note that I work in a very tight corporate environment, so I can only produce limited quality print screens.
options(datatable.verbose=FALSE)
and /tmp
=5gbNotice the message about the bad ending (caused by a fail of the decompression). At posteriori, the warning about problem writing to connection
could have ring a bell.
options(datatable.verbose=TRUE)
and /tmp
=5gbNote that the top of the output is truncated (didn't fit on my screen), but it's the same as the next print screen). Notice also the it read a 5.000GB with 40832623 rows, while the original file longer than that (see below). Also, when datatable.verbose=TRUE
, with don't see the message aout the file ending problem.
options(datatable.verbose=TRUE)
and /tmp
=40gbThis worked. Note that the original file is 19.23GB with 157014112 rows
Given @BastienFR detailed report I think this issue should renamed to "fread fails with unhelpful error message if data is gzipped and to large to fit into /tmp" reopened. Might also belong to @HenrikBengtsson 's https://github.com/HenrikBengtsson/R.utils see also: https://www.linkedin.com/pulse/trivial-fix-after-3-hours-debugging-kirill-tsyganov
hi, when i load my data, i meet some wrong that i dont't know how to solve
can you help me?