JuliaIO / MAT.jl

Julia module for reading MATLAB files
MIT License
278 stars 71 forks source link

Resolve deprecation warnings for v0.7 #86

Closed halleysfifthinc closed 5 years ago

halleysfifthinc commented 6 years ago

According to the testsuite, this fixes all the deprecations.

The only things I'm unsure of is the change in v0.7 for reinterpret/reshape to return reinterpret/reshape types. Since the tests check type equality (and error if mismatched type) before checking for equality in the contents, this required me to add a collect (for most of the cases) around the reinterpret or reshape. This will now allocate when it did not before, and may affect the read/write performance, however I don't see a way around this (other than changing the types of the test data to make the tests pass with reinterpret/reshape types).

halleysfifthinc commented 6 years ago

Tests fail due to the issues with WinRPM and Homebrew..

Evizero commented 6 years ago

What is the state on this?

halleysfifthinc commented 6 years ago

My understanding is that the PR is ready to be merged, depending on how important passing CI is for whoever has merge permissions. Tests pass locally on my machine (Linux x64), but the last run of Travis failed on 1.0 (bad travis config for 1.0, no using Pkg), tests on 0.7 failed because CMakeWrapper failed to build, Blosc failed because of CMakeWrapper. HDF5 failed to install because the apt sources returned 404's; my guess is that is a Travis issue, and I have no clue if there is anything we can do to solve that on our end.

Obviously, once this gets merged, any issues with 1.0 can be fixed.

halleysfifthinc commented 6 years ago

The single OSX failure on Travis is due to the gcc link failing in homebrew. Not sure what caused the Linux failure.

halleysfifthinc commented 6 years ago

All green! @simonster or @yuyichao could we get this merged now?

aliddell commented 6 years ago

All green! @simonster or @yuyichao could we get this merged now?

+1

yuyichao commented 5 years ago

Since the tests check type equality (and error if mismatched type) before checking for equality in the contents, this required me to add a collect (for most of the cases) around the reinterpret or reshape. This will now allocate when it did not before, and may affect the read/write performance, however I don't see a way around this (other than changing the types of the test data to make the tests pass with reinterpret/reshape types).

Most of those looks wrong. You should now allocate the array you want to return and fill the reinterpreted/reshaped version.

halleysfifthinc commented 5 years ago

Ok. I will see about changing that.

bjarthur commented 5 years ago

looking forward to having this!

i think it would be okay for MAT.jl to return ReinterpretArray's. so no need for a collect when reading. but h5d_write probably cannot deal with those, so a collect might be needed when writing. i presume though that the previous behavior of reinterpret did a collect internally, so the performance might be similar for writes and better for reads. thoughts @yuyichao ?

halleysfifthinc commented 5 years ago

While I don't think that returning ReinterpretArrays would be ok, I do agree with you regarding the performance of reinterpret/collect vs. Julia v0.6. My intuition/understanding is that is how reinterpret previously worked, so putting a collect on the reinterpreted arrays should give at least equivalent performance.

At the same time, I share @yuyichao sense of wrongness about collecting reinterpreted arrays. However, in looking at the first couple of occurrences of reinterpret, I'm not sure how to cleanly implement his suggestion without dramatically restructuring the code, and I'm not comfortable enough with this code to do that.

Perhaps we could merge and tag this now, and then the collects and reinterprets could be dealt with later? There seem to be a decent number of people waiting for MAT to be updated.

bjarthur commented 5 years ago

i get this warning when testing this PR on julia 0.7 on linux:

┌ Warning: Vector{UInt8}(s::String) will copy data in the future. To avoid copying, use `unsafe_wrap` or `codeunits` instead.
│   caller = close(::MAT.MAT_HDF5.MatlabHDF5File) at MAT_HDF5.jl:65
└ @ MAT.MAT_HDF5 ~/.julia/dev/MAT/src/MAT_HDF5.jl:65
halleysfifthinc commented 5 years ago

I just fixed that warning and removed nearly all of the reinterprets. I'm not sure why v1 is failing on OSX (but passing on v0.7 and nightly), also, the logs on the passing builds were incomplete and still showed as passing. I'll see if rerunning Travis changes anything.

halleysfifthinc commented 5 years ago

Anyone have a mac and could investigate this? I'm on Ubuntu 18.04 x86_64 with hdf5-tools at version 1.10.0.

Travis seems a bit wonky today. The logs are delayed and/or truncated, so I can't look to see if there is any difference between the passing and not passing OSX runs.

bjarthur commented 5 years ago

tests pass locally for me on OSX for julia 1.0.

@yuyichao any further comments?

halleysfifthinc commented 5 years ago

Regarding the last two places with the extra copies, I'm not sure how avoid them without returning a reinterpret type array. I recognize that @yuyichao doesn't have the time right to look into them more, perhaps @bjarthur has some suggestions? Otherwise this can wait until someone with more experience with Julia can take a look.

yuyichao commented 5 years ago

And the reason I believe they are all fixable without any extra copy is that I believe all what the C library (or HDF5 format/package) should care about is a memory buffer to accept data/fill data in. The type tag on it should not matter so allocating the correct final type and let the library fill the data in should always work.

This might not be compatible with the existing internal or HDF5 API and so fixing this properly might require a non-local change involving internal or external (API) change/addition, which is what I don't really have time to investigate now...

halleysfifthinc commented 5 years ago

That is my sense of things as well. Given that Travis passes now, could this be merged and the extra copies be opened as an issue?

yuyichao commented 5 years ago

I strongly perfer to NOT do that. There should be no rush to get everything working on 1.0 since

  1. 0.6 still works.
  2. I don't believe anyone will actually fix the extra copy if it was not fixed now. It'll be very hard to just figure out which copy is "extra" after this is merged.
nicoleepp commented 5 years ago

https://github.com/JuliaIO/HDF5.jl/pull/520 should make reinterpret work where there currently are extra copies.

bjarthur commented 5 years ago

given https://github.com/JuliaIO/HDF5.jl/pull/520, does anything need to change in this PR still? or are we go for merge?

yuyichao commented 5 years ago

Yes, a few copy still needs to be removed here.

zjroth commented 5 years ago

Would changing

toarray(x::Array{Bool}) = reinterpret.(UInt8, x)

to

function toarray(x::Array{Bool})
    x_uint8 = reinterpret(UInt8, x)
    unsafe_wrap(Array{UInt8}, pointer(x_uint8), size(x_uint8))
end

get rid of the copy in question in an acceptable way? It seems to work on my system.

yuyichao commented 5 years ago

No that's illegal

zjroth commented 5 years ago

No that's illegal

Okay. Sorry for the noise.

th85ac commented 5 years ago

Not running for me under Julia 1.0.1 (mac os x)... all other packages like HDF5 working

oxinabox commented 5 years ago

... Given that Travis passes now, could this be merged and the extra copies be opened as an issue?

re:

I strongly perfer to NOT do that. There should be no rush to get everything working on 1.0 since

  1. 0.6 still works.
  2. I don't believe anyone will actually fix the extra copy if it was not fixed now. It'll be very hard to just figure out which copy is "extra" after this is merged.

@yuyichao is this still your strong perference?

Re point 1. Given now a lot of us have migrated fully off 0.6 (or potentially are being blocked from migration by this)

Obviously this will not effect 0.6 users since this is tagged 0.7+

Re point 2. Has the extra copies been fixed now in the last set of changes? I see some are tagged as fixing extra copies.

timholy commented 5 years ago

@yuyichao, I greatly appreciate the time you're taking to review the various PRs here, but I'd like to put in a plug for getting this resolved in a way that doesn't turn the perfect into the enemy of the good. We know of many imperfections throughout Julia itself and in many other packages, yet often we soldier on, doing the best we can at the time. There are times where an imperfect solution would actively undermine progress, but if the issue here is simply making some extra copies then this doesn't seem to be in that category. (It doesn't have exported API implications, for example, right?)

Unlike you, I haven't taken the time to review the various PRs, so I haven't followed the underlying issues. Would it be reasonable to merge something that at least works and then file an issue detailing the deficiencies that need fixing?

yuyichao commented 5 years ago

I believe the copying issue is only needed to maintain the output type which shouldn't break most of the use so merging PRs that doesn't include that part is certainly fine and I've done that already.

oxinabox commented 5 years ago

Just to be clear master is still broken in 1.0 because a ton of the basic deprecatation removing stuff has not been done (because it is done in this PR).

And master is also broken on 0.7 is also broken because of the reinterpretted output types thing.

halleysfifthinc commented 5 years ago

I noticed the same thing about the updates to master; it was unusable for me (as of a week ago).

I haven't had the time recently to take a look at this again, but the brief look I took at the PR to HDF5 linked earlier in this thread looked to be on the right track. I may have some time in the next week or two to take another stab at this.

On Fri, Nov 16, 2018, 10:30 AM Lyndon White notifications@github.com wrote:

Just to be clear master is still broken in 1.0 because a ton of the basic deprecatation removing stuff has not been done (because it is done in this PR).

And master is also broken on 0.7 is also broken because of the reinterpretted output types thing.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/JuliaIO/MAT.jl/pull/86#issuecomment-439429605, or mute the thread https://github.com/notifications/unsubscribe-auth/AHA_LfV4pYyWJaf6515Cq70slOMckWqVks5uvtoygaJpZM4VnKUN .

-- -AH

nicoleepp commented 5 years ago

The HDF5 changes needed to fix the extra copies here haven't been tagged yet: https://github.com/JuliaIO/HDF5.jl/issues/535

oxinabox commented 5 years ago

Here is the METADATA PR. It is currently kinda stalled https://github.com/JuliaLang/METADATA.jl/pull/19579

oxinabox commented 5 years ago

The HDF5 changes are merged and tagged. Thanks @musm

halleysfifthinc commented 5 years ago

I think that should fix all the extra copies. If somebody (@yuyichao perhaps) can review to double-check, then this will be good to go once JuliaIO/HDF5.jl#538 is merged and tagged.

yuyichao commented 5 years ago

See https://github.com/JuliaIO/HDF5.jl/pull/538#issuecomment-445867472

davidssmith commented 5 years ago

Can we get this merged?

Do you know what is worse than a performance penalty? A broken master. That's infinitely bad performance.

yuyichao commented 5 years ago

Not yet (see comment) and no. Old julia version aren't gone yet.

Also note that master isn't completely broken (some cases works) and I've said many times that whoever want to split out part of this to another PR are welcome to do so. I've merged such PR and I've even done the work of fixing that PR and rebas this on the new master.

Also note that this version is also "broken" awaiting the HDF5 PR linked above.

davidssmith commented 5 years ago

Well it is broken for me, and I created a PR a long time ago that made it work for me, but it was rejected for the reason that it was incomplete, even though it fixed some things.

Don't let perfect be the enemy of the good.

yuyichao commented 5 years ago

I am very surprised. If you are talking about https://github.com/JuliaIO/MAT.jl/pull/92. I believe it's included in https://github.com/JuliaIO/MAT.jl/pull/100, if not, feel free to include the missing pieces. Not being complete is not an issue. https://github.com/JuliaIO/MAT.jl/pull/100 is just like that. However, if it does not completely fix all issues on 1.0, it should not break 0.6.

davidssmith commented 5 years ago

There should be no rush to get everything working on 1.0 since

Also I want to directly disagree with this. Many packages depend on this as part of their backend. The performance of reading and writing MAT files is not as important as allowing all of those packages to upgrade to 1.0+, where they may enjoy speed improvements in execution and better likelihood of adoption. As it is now, anyone who hears about Julia, goes to download 1.0, and tries to install MAT or some package depending on MAT, will get errors. That's not the way to win hearts and minds.

yuyichao commented 5 years ago

You are just giving very good reason for why the release 1.0 shouldn't have been made so quickly. 0.6 is still a perfectly good release to use and start with. I know this is directly conflicting with the super loose release policy and I don't want to argue with either.

Also, I'm not looking for perfect either. It's just about not introducing regression. And if you want to know what is the right way to do this on a very much related change? https://github.com/JuliaLang/julia/pull/21831#issuecomment-302180760 Noticed that the PRs are submitted to the packages to make sure there's no regression before the base PR is merged.

oxinabox commented 5 years ago

However, if it does not completely fix all issues on 1.0, it should not break 0.6.

I must be missing something. This PR will change the minimum julia version to be 0.7 It thus cannot break julia 0.6.

If you are saying that you don't want to lose the ability to make improvements to 0.6, then have a backports branch. I am maintaining 2 packages that have backports branchs and it isn't that hard.

yuyichao commented 5 years ago

This PR will change the minimum julia version to be 0.7

Please read and quote the context.

Not being complete is not an issue. #100 is just like that. However, if it does not completely fix all issues on 1.0, it should not break 0.6.

It's specifically referring to PRs that are "not complete" and "like #100".

babaq commented 5 years ago

is this branch dead or not, I think it's better get merged and make it work first, and then make another PR to fix performance regression. v1.0 is relatively big change to v0.6 that's why it's better to move to v1.0 early than doing more work in v0.6. please make this work early, cause a lot of data is in MAT format.

chriselrod commented 5 years ago

FWIW, this branch works for me, but the merged one errors.

jaakkor2 commented 5 years ago

Looks like Travis CI was run two months ago. For me this PR passes tests using HDF5 v0.10.4. One line fix is needed for HDF5 v0.11.0. Who is able to trigger Travis? Looks like closing and reopening 18 days ago did not do it.

timholy commented 5 years ago

Superseded by #110 (merged in #113)