cmhughes / latexindent.pl

Perl script to add indentation (leading horizontal space) to LaTeX files. It can modify line breaks before, during and after code blocks; it can perform text wrapping and paragraph line break removal. It can also perform string-based and regex-based substitutions/replacements. The script is customisable through its YAML interface.
GNU General Public License v3.0
884 stars 84 forks source link

Run without Unicode::GCString ? #303

Closed tdegeus closed 2 years ago

tdegeus commented 2 years ago

I have been preparing a conda feedstock for this library : https://github.com/conda-forge/staged-recipes/pull/16914

A problem that appeared, and that might persist for a while, stems from the fact that Unicode::GCString wraps a compiled library (sombok). Now the issue is that there is currently no way yes to ship the binary for mac's ARM processor (at least not on conda-forge as it currently relies on cross-compilation as Azure does not have (sufficient) hardware yet).

So now the question: is there a way to run without Unicode::GCString (e.g. in a "light" mode) ? Or is there a way to run without the compile sombok library ?

cmhughes commented 2 years ago

Thanks for this.

The GCString module is central to the align at ampersand routine.

It is not possible to run the script without it.

Is it possible to create a standalone executable file for Mac that wouldn't require perl?

cmhughes commented 2 years ago

I should have said: thank you for taking this forward, and for contributing to the project. I really appreciate it :)

cmhughes commented 2 years ago

In case it's helpful, here's a link to the github action that creates the Windows executable:

https://github.com/cmhughes/latexindent.pl/actions/runs/1454209891/workflow

Perhaps something similar could be done to create a Mac executable that doesn't need perl....?

tdegeus commented 2 years ago

I think for sure it should be done on Windows!

Then, not sure if it solves the Mac ARM issue, as the problem is that for now it is cross-compiled, so probably we would hit the same issue.

cmhughes commented 2 years ago

Is there anything I can do to progress this?

tdegeus commented 2 years ago

Other than using a pure-perl library to replace I don't think so ;)

For now the conda-forge package is live and published and works on Linux/macOS, and I think Windows but that is good to test, but not macOS+ARM.

To get it working of macOS+ARM, we can:

  1. Wait: at some point native CI will come solving all issues at once.
  2. Wait: at some point there might be better perl cross-compile support.
  3. Investigate how sombok can be build and install separately making per-unicode-linebreak generic. I've reached out : https://github.com/hatukanezumi/sombok/issues/3 but if you know how to do this it would be great.
cmhughes commented 2 years ago

Thanks, that's helpful.I'm going to note the following here for possible exploration:

https://stackoverflow.com/questions/1326539/how-do-i-find-the-length-of-a-unicode-string-in-perl

https://perldoc.perl.org/perlunicook

If you, or anyone else, feels like exploring, the main file to focus upon is AlignAtAmpersand.pm.

tdegeus commented 2 years ago

Sounds interesting. My perl expertise is not enough to contribute with such a change in finite time. However, I would strongly encourage a perl-only solution. This would render latexindent.pl more robust and completely OS and hardware independent

cmhughes commented 2 years ago

Some encouraging early tests...

use Encode;
my @unicodetests = ("brûlée","Jörg","Größe");
for(@unicodetests){
    print length(Encode::decode('UTF-8', $_)),"\n";
}

gives

6
4
5

as expected... further experimentation needed.

tdegeus commented 2 years ago

Sounds great. Starting preparations : https://github.com/conda-forge/staged-recipes/pull/17143

tdegeus commented 2 years ago

@cmhughes Not sure is Encode solves the problem we set out to solve : https://github.com/dankogai/p5-encode/issues/163 .

cmhughes commented 2 years ago

OK!

The bottom line is: latexindent needs a way to measure Unicode strings.

tdegeus commented 2 years ago

@cmhughes Seems that I misjudged: Encode is part of perl's core, which we do have on all platforms, so using it should solve all problems. Indeed I can simply run your example. So +1 for this change!

tdegeus commented 2 years ago

(Regardless of the issue with which this started: it seems much safer to depend on a core library, Unicode::GCString does not seem to be too actively maintained)

cmhughes commented 2 years ago

OK, that's good to know :)

I'll continue to explore. The above was a good first step :) updates to follow when I have them.

On Thu, 2 Dec 2021, 18:21 Tom de Geus, @.***> wrote:

(Regardless of the issue with which this started: it seems much safer to depend on a core library, Unicode::GCString does not seem to be too actively maintained)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cmhughes/latexindent.pl/issues/303#issuecomment-984881674, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQ7CYDPXF6L75Y3YDFCSSDUO62JTANCNFSM5H6KUDFA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

cmhughes commented 2 years ago

As of https://github.com/cmhughes/latexindent.pl/commit/ebdcbd903dfde66e8084c9321d5621fc246e882e I believe this is implemented.

Feel free to grab a copy of the develop branch if you'd like to test it. I'm hoping to get a new version released soon.

cmhughes commented 2 years ago

To test:

https://www.cl.cam.ac.uk/~mgk25/ucs/examples/quickbrown.txt

(i searched for 'unicode test string')

tdegeus commented 2 years ago

I can confirm that latexindent.pl works as expected on my document ! So without perl-unicode-linebreak such that it is now fully hardware independent. Thanks a lot !!

The conda-forge package will update with the new release.

tdegeus commented 2 years ago

https://github.com/conda-forge/latexindent.pl-feedstock/pull/1

cmhughes commented 2 years ago

That's great, that ks for confirming.

Thanks also for your contribution to the README.

Do you think the appendix in the documentation could use some details too? If so, feel free to submit another pull request, or to give me something that I can copy & paste into it.

https://latexindentpl.readthedocs.io/en/latest/appendices.html

tdegeus commented 2 years ago

Done : https://github.com/cmhughes/latexindent.pl/pull/311

cmhughes commented 2 years ago

A few test cases are causing issues. I'm working on resolving them, which is delaying the release.... details to follow when I have them.

cmhughes commented 2 years ago

Work will continue on this, but on its own branch https://github.com/cmhughes/latexindent.pl/commit/3044ed32f6c927ccd3c6745ba1c52d1274517dc0

I'm hoping to pick this up in the new year.

tdegeus commented 2 years ago

Just to make sure with what you mean "on its own branch" : I guess that you mean still needing some things like additional unit-tests etc. before merging ? Or do you intend to keep it on a separate branch?

cmhughes commented 2 years ago

The first thing:I need to do more testing.

I think it'll be fine, but I need to dedicate some time to it.

I hope to take it on in 2022.

On Mon, 13 Dec 2021, 17:29 Tom de Geus, @.***> wrote:

Just to make sure with what you mean "on its own branch" : I guess that you mean still needing some things like additional unit-tests etc. before merging ? Or do you intend to keep it on a separate branch?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cmhughes/latexindent.pl/issues/303#issuecomment-992705934, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQ7CYHTLHIEKLNOPIOKYVLUQYUQDANCNFSM5H6KUDFA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

tdegeus commented 2 years ago

Just a side note, not to pressure you (well ok a little ;)). Using the Makefile.PL we are writing in https://github.com/cmhughes/latexindent.pl/issues/316 I finally also tested if CPAN does provide a macOS ARM package for Unicode::GCString, and the answer is no.

Likewise I finally tested for texlive distribution: same answer.

So on this platform that dependency is really a blocker with the only possible solution hand-compiling the dependency. What would be great is if you at minimum rebase the branch feature/no-GCString to main such that a patch on that platform can be provided on conda

cmhughes commented 2 years ago

Great, many thanks.

I'm hoping to test this more over the coming days. I'm compiling some more questions, which will follow :)

The GCString thing is going to have to wait until this has been finished. Changing that could, if not done carefully, fundamentally break the script, it can't be rushed.

Let's stay focused on this as is, it's going in a nice direction :)

On Wed, 19 Jan 2022, 09:06 Tom de Geus, @.***> wrote:

Just a side note, not to pressure you (well ok a little ;)). Using the Makefile.PL we are writing in #316 https://github.com/cmhughes/latexindent.pl/issues/316 I finally also tested if CPAN does provide a macOS ARM package for Unicode::GCString, and the answer is no.

Likewise I finally tested for texlive distribution: same answer.

So on this platform that dependency is really a blocker with the only possible solution hand-compiling the dependency

— Reply to this email directly, view it on GitHub https://github.com/cmhughes/latexindent.pl/issues/303#issuecomment-1016225829, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQ7CYE6FWPTKCKLIYALQYLUWZ5JTANCNFSM5H6KUDFA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

cmhughes commented 2 years ago

This response was designed for the pull request, apologies.

paddyroddy commented 2 years ago

This library is great and very glad there's an official pre-commit hook now but I am now running into issues with Unicode::GCString mentioned above on M1 arm mac.

I can't explain why but my M1 (non-pro) mac is able to run it, not sure how I got it to work. But my brand new M1 Pro/max one cannot due to Unicode::GCStirng. The error I descirbed here https://github.com/Perl/perl5/issues/19367

tdegeus commented 2 years ago

It's simply because Unicode::GCString is a compiled module that is not provided for your (and mine) hardware. I've patched the conda package with the fix discussed here, so that you can use without any problem.

Alternatively you could manually compile Unicode::GCString on your system.

khwilliamson commented 2 years ago

I looked up what sombok does. From its README, it appears that it could be replaced by features in modern Perl. \X matches a single grapheme since v5.8. \b{gcb} is another way of looking at that information. \b{lb} can be used to easily implement the Unicode line breaking algorithm. Both the \b{} constructs were introduced in Perl 5.22. In other words, there likely is a pure perl solution to sombok since that time

paddyroddy commented 2 years ago

It's simply because Unicode::GCString is a compiled module that is not provided for your (and mine) hardware. I've patched the conda package with the fix discussed here, so that you can use without any problem.

Alternatively you could manually compile Unicode::GCString on your system.

How do I compile Unicode::GCString manually?

tdegeus commented 2 years ago

(No guarantee, I'm not an expert)

I think that you can do this locally

perl Makefile.PL INSTALLDIRS=vendor NO_PERLLOCAL=1 NO_PACKLIST=1
make
make test
make install VERBINST=1

Reference: https://github.com/conda-forge/perl-unicode-linebreak-feedstock/blob/561a8c44b5a006858dbfeafaf983e6e32b427022/recipe/meta.yaml#L18

paddyroddy commented 2 years ago

(No guarantee, I'm not an expert)

I think that you can do this locally

perl Makefile.PL INSTALLDIRS=vendor NO_PERLLOCAL=1 NO_PACKLIST=1
make
make test
make install VERBINST=1

Reference: https://github.com/conda-forge/perl-unicode-linebreak-feedstock/blob/561a8c44b5a006858dbfeafaf983e6e32b427022/recipe/meta.yaml#L18

gave it a go, no luck

cmhughes commented 2 years ago

I need to explore the following https://stackoverflow.com/questions/3945583/how-can-i-conditionally-use-a-module-in-perl

tdegeus commented 2 years ago

Sounds reasonable if the two libraries are not equivalent (I really cannot judge). Fortunately, it seems rather straightforward in perl though. I you support both you should just wonder what should be the default.

cmhughes commented 2 years ago

The default is unclear to me at this point. I need to get a working version :)

I also need to look at https://perldoc.perl.org/if and other results from "perl script load module conditionally"

tdegeus commented 2 years ago

(For info, I have been using the feature/no-GCstring path for a while now without any problems. There were also no complaints on the conda feedstock that has been using this patch since the beginning)

cmhughes commented 2 years ago

Some progress on this, I think :) https://github.com/cmhughes/latexindent.pl/commit/0b8f7b916e73d069fe9ac25056007fd1f78c3d13 still on its own branch, for the moment.

I'm not sure if you can test the following perl script, @tdegeus ...? In particular, I'd like to know what the following calls give you:

perl test-gcstring-load.pl
perl test-gcstring-load.pl --GCString

I'd hope that the first one works, and that the second one does not work (because you don't have Unicode::GCString installed/available).

test-gcstring-load.pl

use strict;
use warnings;
use Getopt::Long;

# get the options
my %switches = ();

GetOptions (
    "GCString"=>\$switches{GCString},
);

# conditionally load the GCString module
eval "use Unicode::GCString" if $switches{GCString};

if ($switches{GCString}){
    print "GCString *is* loaded\n";
} else {
    print "GCString is *not* loaded\n";
}
exit (0);
tdegeus commented 2 years ago

I can confirm that

perl test-gcstring-load.pl

runs without errors, printing

GCString is *not* loaded

However,

perl test-gcstring-load.pl --GCString

also runs without errors, printing

GCString *is* loaded

while when I put

Unicode::GCString;

on the top of the string I get as error

Can't locate Unicode/GCString.pm in @INC (you may need to install the Unicode::GCString module)
tdegeus commented 2 years ago

Furthermore,

use if 0, "Unicode::GCString";  # -> runs without error
use if 1, "Unicode::GCString";  # -> throws with "Can't locate Unicode/GCString.pm in @INC"
use if $switches{GCString}, "Unicode::GCString";  # -> runs without errors regardless of the command line argument (even though the correct string is printed)
tdegeus commented 2 years ago

When I try to run latexindent.pl from the feature/no-GCString branch

/my/path/latexindent.pl -l -m -s -w main.tex

runs successfully, while

/my/path/latexindent.pl -l -m -s -w --GCString main.tex

throws with

Can't locate object method "new" via package "Unicode::GCString" (perhaps you forgot to load "Unicode::GCString"?) at /my/path/latexindent.pl/LatexIndent/AlignmentAtAmpersand.pm line 1595
tdegeus commented 2 years ago

So I guess that all is good there!

cmhughes commented 2 years ago

Great, thanks for your time on this!

I think that everything you report is as expected (which isn't what I originally thought).

The eval statement is called at 'run' time, I think, whereas the use statement is called at 'compile' time.

The script can continue even if the Eval statement fails.

The script can not continue if the use statement fails.

I'll get this documented and hopefully merged and then released :)

tdegeus commented 2 years ago

Awesome! Looking forward to it!

cmhughes commented 2 years ago

This is implemented and documented as of https://github.com/cmhughes/latexindent.pl/releases/tag/V3.17

This should mean that you no longer need the feature/no-GCString branch. Let me know once you no longer need it, and I'll delete it. Thanks for your help!

tdegeus commented 2 years ago

Works like a charm, thanks a million 🙏

cmhughes commented 2 years ago

Great to hear. I'll plan to delete the feature branch.

On Fri, 25 Mar 2022, 17:13 Tom de Geus, @.***> wrote:

Works like a charm, thanks a million 🙏

— Reply to this email directly, view it on GitHub https://github.com/cmhughes/latexindent.pl/issues/303#issuecomment-1079232854, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQ7CYAGFGWGLWBZ7OMD4OLVBXX3JANCNFSM5H6KUDFA . You are receiving this because you modified the open/close state.Message ID: @.***>

cmhughes commented 2 years ago

feature/no-GCString deleted :)