OCR-D / ocrd_anybaseocr

DFKI Layout Detection for OCR-D
Apache License 2.0
48 stars 12 forks source link

Wrong license? #96

Open stweil opened 1 year ago

stweil commented 1 year ago

The cropping part uses lsd, and lsd.cpp uses AGPL 3. Linking AGPL 3 binaries requires AGPL 3 for the final product, too.

tfmorris commented 1 year ago

The problem is actually upstream in the pylsd package, so I've opened https://github.com/primetang/pylsd/issues/20.

There's an LSD implementation in OpenCV which may represent a possible alternative.

bertsky commented 1 year ago

There's an LSD implementation in OpenCV which may represent a possible alternative.

Yes – the problem is that it's not nearly as good, and has next to no parameters for adaptation.

For ocrd-anybaseocr-crop, we rely heavily on this one detector in how we set detection parameters, what kind of line segment candidates we expect to get so we can group and rate them in a meaningful way. – And it's the only good cropper for historic prints we have at the moment...

stweil commented 1 year ago

It's also possible to fix the license either for ocrd-anybaseocr-crop only (maybe that would require splitting the repository) or for the whole ocrd_anybaseocr. I think OCR-D processors with GPL or AGPL may be used.

stweil commented 1 year ago

@kba, @bertsky, this is a serious legal issue which should be fixed soon. If @primetang is not willing or able to fix the problem in the original code, it must be fixed in the fork and here.

bertsky commented 1 year ago

Sure, let's wait for his commentary first, then if necessary switch to AGPL for the whole lot.

bertsky commented 1 year ago

Sure, let's wait for his commentary first, then if necessary switch to AGPL for the whole lot.

(But perhaps we should ask the original contributors if that's ok – @n00blet @mahmed1995 @mjenckel @khurramHashmi. Otherwise we must indeed split up the repo.)

tfmorris commented 1 year ago

@stweil Thanks for resurrecting this. @primetang isn't going to be able to help with the license, because the code actually belongs to @rafael-grompone-von-gioi. The only thing Gefu Tang added to the code was a hacky function to return data via a file. The original code is available here as a supplement to this journal article.

On the plus side, @rafael-grompone-von-gioi has shown some willingness to relicense code in the past for the OpenCV project, so perhaps he'd be willing to do so in this case.

I missed the fact that the immediate upstream is actually https://github.com/kba/pylsd which generates the PyPI package called ocrd-fork-pylsd (not sure why it's not included in the OCR-D org), so the license should be fixed there, whether or not the upstream license gets fixed.

Consideration might also be given to updating to 1.6 and switching to a memory-based API instead of the current file-based hack.

bertsky commented 1 year ago

@tfmorris thanks for clarifying this, and for https://github.com/kba/pylsd/pull/4!

Consideration might also be given to updating to 1.6 and switching to a memory-based API instead of the current file-based hack.

Oh, I did not notice that before. (I do remember wondering why the lsd standalone tests look different from my pylsd results, but did not investigate.) Quite a significant difference! Needs to be tested thoroughly with regards to its impact on parameters and postprocessing in ocrd-anybaseocr-crop...

called ocrd-fork-pylsd (not sure why it's not included in the OCR-D org),

Agreed, would make more sense – @kba?

kba commented 1 year ago

called ocrd-fork-pylsd (not sure why it's not included in the OCR-D org),

Agreed, would make more sense – @kba?

Sure, moving to OCR-D is not a problem, I did not realize that we would depend on this fork for so long, hence the PyPI name. Changing the license is also straightforward - thanks @stweil @tfmorris for raising this concern.

Fixing the build process and updating the API will take longer.

stweil commented 4 months ago

Is there anything new regarding this issue? If not, we should fix the license by switching to AGPL 3 soon.