Closed hungpham3112 closed 1 year ago
Hi and thank you for the bug report! Would you mind testing whether this still occurs after updating CSV.jl? Version 0.10.11 (tagged yesterday) includes https://github.com/JuliaData/CSV.jl/pull/1073 which intends to fix this kind of issues.
I tested, the data race frequency decreased but the problem is still there. Moreover, now sometimes this plugin causes Pluto to hang for about 5 minutes I think because data racing.
https://github.com/JuliaData/CSV.jl/assets/75968004/f6a5b0f0-fb1c-4b85-94a3-692f6212171a
My thought: if run the code single time, I mean run and wait until the code done -> continue, no problem exist with type. But if we run it many times, like I spam in the video, data racing will happen with multiple core(in my example is 8 cores). Idk if my thought is true or not, please explain for me.
Ah that's unfortunate and unexpected. It seems I cannot reproduce the issue: I tried running a Pluto notebook with the same environment (JULIA_NUM_THREADS=8 JULIA_REVISE_WORKER_ONLY=1 ~/julia-1.9.0/bin/julia --startup-file=no -e "using Pluto; Pluto.run()"
) and I put the code of your initial message, one line per cell. Then I did like in your video, refreshing the df
definition cell repeatedly, even just leaving Shift+Enter pressed down for a while, but I never see the type of the first column changing.
I also tried the following to automate things a bit:
body = HTTP.get(filename).body
for _ in 1:10000
df2 = CSV.read(body, DataFrame, header=headers)
if eltype(df2[!,1]) != Int64
error("Encountered: $(eltype(df2[!,1]))")
end
end
but no error occurs.
Just to check if it can be something else in the configuration, can you please check the output of Base.Threads.nthreads()
in one cell of your Pluto notebook, as well as that of import Pkg; Pkg.status()
? Mine yields respectively 8
and
Status `/tmp/jl_pNSR9l/Project.toml`
[336ed68f] CSV v0.10.11
[a93c6f00] DataFrames v1.5.0
[cd3eb016] HTTP v1.9.6
[44cfe95a] Pkg v1.9.0
[10745b16] Statistics v1.9.0
Just to check if it can be something else in the configuration, can you please check the output of
Base.Threads.nthreads()
in one cell of your Pluto notebook, as well as that ofimport Pkg; Pkg.status()
? Mine yields respectively8
and
Here is the output:
Ah that's unfortunate and unexpected. It seems I cannot reproduce the issue: I tried running a Pluto notebook with the same environment (JULIA_NUM_THREADS=8 JULIA_REVISE_WORKER_ONLY=1 ~/julia-1.9.0/bin/julia --startup-file=no -e "using Pluto; Pluto.run()") and I put the code of your initial message, one line per cell. Then I did like in your video, refreshing the df definition cell repeatedly, even just leaving Shift+Enter pressed down for a while, but I never see the type of the first column changing. I also tried the following to automate things a bit: I can reproduce the error with your requirement, maybe your OS is different to me. I'm using Windows 11 to test, with powershell=7.2.
https://github.com/JuliaData/CSV.jl/assets/75968004/fd088aa2-00bc-489c-8cbe-50075e98e442
Thanks for checking: apparently you are still using CSV v0.10.10, but the bugfix I mentioned was only released starting from with CSV v0.10.11, which explains why you are still seeing this bug.
Would you mind updating the package and letting us know whether the bug still occurs afterwards? To update, run Pkg.update("CSV")
from a cell of your notebook (or simply Pkg.update()
to update all packages in your environment): you should see somewhere a line stating
[336ed68f] ↑ CSV v0.10.10 ⇒ v0.10.11
Thanks for checking: apparently you are still using CSV v0.10.10, but the bugfix I mentioned was only released starting from with CSV v0.10.11, which explains why you are still seeing this bug. Would you mind updating the package and letting us know whether the bug still occurs afterwards? To update, run
Pkg.update("CSV")
from a cell of your notebook (or simplyPkg.update()
to update all packages in your environment): you should see somewhere a line stating[336ed68f] ↑ CSV v0.10.10 ⇒ v0.10.11
I realized that I only update local env not Pluto. sorry for that. The first time I check, data racing still exist but at the second time and third time everything ok. There's something weird in here or maybe problem with multi threads. We need more people to validate this behavior. Thanks
Hi, today I come back to the problem and no data racing anymore. My thought was the last time I updated CSV from v0.10.10 => v0.10.11, temporary file still exists in local machine then the bug still occurs. #1073 absolutely fixes this issue. Thanks for the hard working. I will close this issue in here.
Step to reproduce:
df = CSV.read(HTTP.get(filename).body, DataFrame, header=headers)
and see sometimes the column changes its type.I tested the csv file in Python, the first column is always fixed data type (Float64)-> not the problem with csv file. Then I tried above snippet in Jupyter notebook and Pluto both experience the same bug. -> The problem with CSV.read and CSV.File
Vid:
https://github.com/JuliaData/CSV.jl/assets/75968004/dbcbe99e-85fa-4091-bddf-7a2cd1aa8e01
https://github.com/JuliaData/CSV.jl/assets/75968004/727fc383-d01b-4fa9-be24-d885b4c6024a
Versioninfo: