DanBloomberg / leptonica

Leptonica is an open source library containing software that is broadly useful for image processing and image analysis applications. The official github repository for Leptonica is: danbloomberg/leptonica. See leptonica.org for more documentation.
Other
1.74k stars 387 forks source link

Fix GNU/Hurd build #668

Closed sthibaul closed 1 year ago

sthibaul commented 1 year ago

There is no PATH_MAX limitation on GNU/Hurd, and realpath() can be safely be used with its second parameter set to NULL (as required by posix since its version 2001).

sthibaul commented 1 year ago

Nice job!!

You need to use the LEPT_FREE() macro instead of free() in line 1924

Ok, done so!

DanBloomberg commented 1 year ago

Is thee any problem with the older version of realpath()? Wouldn't the underlying implementation be identical for the two invocations? Have you run into situations where PATH_MAX (4096) is exceeded?

sthibaul commented 1 year ago

Is thee any problem with the older version of realpath()? Wouldn't the underlying implementation be identical for the two invocations?

I don't understand your question.

On non-glibc old systems, you are not allowed to pass NULL as second parameter, so it always uses the passed buffer to store the result, without passing the size, thus the need for the PATH_MAX constant to tell how long it's supposed to be, and thus making the value of PATH_MAX cast into stone otherwise buffer overflows happen.

On glibc or posix 2001 systems, you can pass NULL so that the buffer is then allocated to the right size, and we don't have to rely on the arbitrary PATH_MAX constant.

DanBloomberg commented 1 year ago

I do understand. Have you run into any situation where the path is longer than 4096 bytes?

sthibaul commented 1 year ago

I do understand. Have you run into any situation where the path is longer than 4096 bytes?

Not for now, but such arbitrary limits are most often not really tested, and are thus a source of buffer overflows etc.

DanBloomberg commented 1 year ago

I see that realpath() will allocate a buffer up to PATH_MAX bytes, so that limit always applies. It is just a question of whether the buffer is allocated with a fixed size on the stack or malloc'd at run-time.

DanBloomberg commented 1 year ago

And the manpage describes the design bug as "what happens when PATH_MAX is not defined?" However, such a situation is not fixed by your PR. Have you run into a situation where PATH_MAX is not defined?

sthibaul commented 1 year ago

I see that realpath() will allocate a buffer up to PATH_MAX bytes, so that limit always applies.

On Linux, yes, because it keeps that limitation.

And the manpage describes the design bug as "what happens when PATH_MAX is not defined?" However, such a situation is not fixed by your PR.

It does: in the case the PR introduces, PATH_MAX is not used at all.

Have you run into a situation where PATH_MAX is not defined?

As mentioned in the PR, GNU/Hurd does not define PATH_MAX, precisely to avoid such fixed buffer sizes questions.

DanBloomberg commented 1 year ago

OK, I misread your statement about PATH_MAX not being defined in GNU/Hurd. And I didn't realize that development of the GNU/Hurd microkernel OS was still going on, given the success and flexibility of linux. Just for my interest, do you have any document that describes the project goals and where it currently is?

I asked you all those questions earlier because software is a messy business. Anything worth doing is worth maintaining, and software always need to be maintained. Two of my rules for maintenance are (1) if it works, don't fix it (2) simpler (often, more 'elegant') is better. I'm usually happy to break (1) to satisfy (2). In this case, since rule (1) actually fails, we'll merge, and thank you for the help.

Dan

sthibaul commented 1 year ago

Just for my interest, do you have any document that describes the project goals and where it currently is?

https://www.gnu.org/software/hurd/

(1) if it works, don't fix it (2) simpler (often, more 'elegant') is better.

Well, the simplest would be to just assume posix 2001 behavior by passing NULL to realpath() and let it allocate the result :)

DanBloomberg commented 1 year ago

The manpage says:

      The  resolved_path == NULL  feature,  not  standardized  in
       POSIX.1-2001, but standardized  in  POSIX.1-2008,  allows  this  design
       problem to be avoided.

So 2008 instead of 2001. Would you recommend assuming posix 2008? Seems safe to me.

sthibaul commented 1 year ago

So 2008 instead of 2001

Oh, indeed, 2001 only said it was implementation-defined behavior. Now fixing my notes.

Would you recommend assuming posix 2008? Seems safe to me.

It depends on your typical userbase :) glibc on Linux has been supporting it since its introduction of realpath, so Linux is completely safe with it. I don't know exactly for other OSes.

DanBloomberg commented 1 year ago

We do have the odd user on DEC or Sun machines. But glibc is nearly universal, android and iOS stay up to date on posix, so it seems worth taking the chance. I'll give it a try.

DanBloomberg commented 1 year ago

No, we'll leave PATH_MAX in.

It appears that AIX still does not support NULL for 2nd parameter. And old versions of Solaris do not either (although the newest one apparently does).