Closed tdegeus closed 2 years ago
Thanks for this.
The GCString module is central to the align at ampersand routine.
It is not possible to run the script without it.
Is it possible to create a standalone executable file for Mac that wouldn't require perl?
I should have said: thank you for taking this forward, and for contributing to the project. I really appreciate it :)
In case it's helpful, here's a link to the github action that creates the Windows executable:
https://github.com/cmhughes/latexindent.pl/actions/runs/1454209891/workflow
Perhaps something similar could be done to create a Mac executable that doesn't need perl....?
I think for sure it should be done on Windows!
Then, not sure if it solves the Mac ARM issue, as the problem is that for now it is cross-compiled, so probably we would hit the same issue.
Is there anything I can do to progress this?
Other than using a pure-perl library to replace I don't think so ;)
For now the conda-forge package is live and published and works on Linux/macOS, and I think Windows but that is good to test, but not macOS+ARM.
To get it working of macOS+ARM, we can:
Thanks, that's helpful.I'm going to note the following here for possible exploration:
https://stackoverflow.com/questions/1326539/how-do-i-find-the-length-of-a-unicode-string-in-perl
https://perldoc.perl.org/perlunicook
If you, or anyone else, feels like exploring, the main file to focus upon is AlignAtAmpersand.pm.
Sounds interesting. My perl expertise is not enough to contribute with such a change in finite time. However, I would strongly encourage a perl-only solution. This would render latexindent.pl
more robust and completely OS and hardware independent
Some encouraging early tests...
use Encode;
my @unicodetests = ("brûlée","Jörg","Größe");
for(@unicodetests){
print length(Encode::decode('UTF-8', $_)),"\n";
}
gives
6
4
5
as expected... further experimentation needed.
Sounds great. Starting preparations : https://github.com/conda-forge/staged-recipes/pull/17143
@cmhughes Not sure is Encode solves the problem we set out to solve : https://github.com/dankogai/p5-encode/issues/163 .
OK!
The bottom line is: latexindent needs a way to measure Unicode strings.
@cmhughes Seems that I misjudged: Encode is part of perl's core, which we do have on all platforms, so using it should solve all problems. Indeed I can simply run your example. So +1 for this change!
(Regardless of the issue with which this started: it seems much safer to depend on a core library, Unicode::GCString does not seem to be too actively maintained)
OK, that's good to know :)
I'll continue to explore. The above was a good first step :) updates to follow when I have them.
On Thu, 2 Dec 2021, 18:21 Tom de Geus, @.***> wrote:
(Regardless of the issue with which this started: it seems much safer to depend on a core library, Unicode::GCString does not seem to be too actively maintained)
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cmhughes/latexindent.pl/issues/303#issuecomment-984881674, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQ7CYDPXF6L75Y3YDFCSSDUO62JTANCNFSM5H6KUDFA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
As of https://github.com/cmhughes/latexindent.pl/commit/ebdcbd903dfde66e8084c9321d5621fc246e882e I believe this is implemented.
Feel free to grab a copy of the develop
branch if you'd like to test it. I'm hoping to get a new version released soon.
To test:
https://www.cl.cam.ac.uk/~mgk25/ucs/examples/quickbrown.txt
(i searched for 'unicode test string')
I can confirm that latexindent.pl works as expected on my document ! So without perl-unicode-linebreak such that it is now fully hardware independent. Thanks a lot !!
The conda-forge package will update with the new release.
That's great, that ks for confirming.
Thanks also for your contribution to the README.
Do you think the appendix in the documentation could use some details too? If so, feel free to submit another pull request, or to give me something that I can copy & paste into it.
https://latexindentpl.readthedocs.io/en/latest/appendices.html
A few test cases are causing issues. I'm working on resolving them, which is delaying the release.... details to follow when I have them.
Work will continue on this, but on its own branch https://github.com/cmhughes/latexindent.pl/commit/3044ed32f6c927ccd3c6745ba1c52d1274517dc0
I'm hoping to pick this up in the new year.
Just to make sure with what you mean "on its own branch" : I guess that you mean still needing some things like additional unit-tests etc. before merging ? Or do you intend to keep it on a separate branch?
The first thing:I need to do more testing.
I think it'll be fine, but I need to dedicate some time to it.
I hope to take it on in 2022.
On Mon, 13 Dec 2021, 17:29 Tom de Geus, @.***> wrote:
Just to make sure with what you mean "on its own branch" : I guess that you mean still needing some things like additional unit-tests etc. before merging ? Or do you intend to keep it on a separate branch?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cmhughes/latexindent.pl/issues/303#issuecomment-992705934, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQ7CYHTLHIEKLNOPIOKYVLUQYUQDANCNFSM5H6KUDFA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
Just a side note, not to pressure you (well ok a little ;)). Using the Makefile.PL
we are writing in https://github.com/cmhughes/latexindent.pl/issues/316 I finally also tested if CPAN does provide a macOS ARM package for Unicode::GCString
, and the answer is no.
Likewise I finally tested for texlive
distribution: same answer.
So on this platform that dependency is really a blocker with the only possible solution hand-compiling the dependency.
What would be great is if you at minimum rebase the branch feature/no-GCString
to main such that a patch on that platform can be provided on conda
Great, many thanks.
I'm hoping to test this more over the coming days. I'm compiling some more questions, which will follow :)
The GCString thing is going to have to wait until this has been finished. Changing that could, if not done carefully, fundamentally break the script, it can't be rushed.
Let's stay focused on this as is, it's going in a nice direction :)
On Wed, 19 Jan 2022, 09:06 Tom de Geus, @.***> wrote:
Just a side note, not to pressure you (well ok a little ;)). Using the Makefile.PL we are writing in #316 https://github.com/cmhughes/latexindent.pl/issues/316 I finally also tested if CPAN does provide a macOS ARM package for Unicode::GCString, and the answer is no.
Likewise I finally tested for texlive distribution: same answer.
So on this platform that dependency is really a blocker with the only possible solution hand-compiling the dependency
— Reply to this email directly, view it on GitHub https://github.com/cmhughes/latexindent.pl/issues/303#issuecomment-1016225829, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQ7CYE6FWPTKCKLIYALQYLUWZ5JTANCNFSM5H6KUDFA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
You are receiving this because you were mentioned.Message ID: @.***>
This response was designed for the pull request, apologies.
This library is great and very glad there's an official pre-commit
hook now but I am now running into issues with Unicode::GCString
mentioned above on M1 arm mac.
I can't explain why but my M1 (non-pro) mac is able to run it, not sure how I got it to work. But my brand new M1 Pro/max one cannot due to Unicode::GCStirng
. The error I descirbed here https://github.com/Perl/perl5/issues/19367
It's simply because Unicode::GCString
is a compiled module that is not provided for your (and mine) hardware. I've patched the conda package with the fix discussed here, so that you can use without any problem.
Alternatively you could manually compile Unicode::GCString
on your system.
I looked up what sombok does. From its README, it appears that it could be replaced by features in modern Perl. \X matches a single grapheme since v5.8. \b{gcb} is another way of looking at that information. \b{lb} can be used to easily implement the Unicode line breaking algorithm. Both the \b{} constructs were introduced in Perl 5.22. In other words, there likely is a pure perl solution to sombok since that time
It's simply because
Unicode::GCString
is a compiled module that is not provided for your (and mine) hardware. I've patched the conda package with the fix discussed here, so that you can use without any problem.Alternatively you could manually compile
Unicode::GCString
on your system.
How do I compile Unicode::GCString
manually?
(No guarantee, I'm not an expert)
I think that you can do this locally
perl Makefile.PL INSTALLDIRS=vendor NO_PERLLOCAL=1 NO_PACKLIST=1
make
make test
make install VERBINST=1
(No guarantee, I'm not an expert)
I think that you can do this locally
perl Makefile.PL INSTALLDIRS=vendor NO_PERLLOCAL=1 NO_PACKLIST=1 make make test make install VERBINST=1
gave it a go, no luck
I need to explore the following https://stackoverflow.com/questions/3945583/how-can-i-conditionally-use-a-module-in-perl
Sounds reasonable if the two libraries are not equivalent (I really cannot judge). Fortunately, it seems rather straightforward in perl though. I you support both you should just wonder what should be the default.
The default is unclear to me at this point. I need to get a working version :)
I also need to look at https://perldoc.perl.org/if and other results from "perl script load module conditionally"
(For info, I have been using the feature/no-GCstring path for a while now without any problems. There were also no complaints on the conda feedstock that has been using this patch since the beginning)
Some progress on this, I think :) https://github.com/cmhughes/latexindent.pl/commit/0b8f7b916e73d069fe9ac25056007fd1f78c3d13 still on its own branch, for the moment.
I'm not sure if you can test the following perl script, @tdegeus ...? In particular, I'd like to know what the following calls give you:
perl test-gcstring-load.pl
perl test-gcstring-load.pl --GCString
I'd hope that the first one works, and that the second one does not work (because you don't have Unicode::GCString
installed/available).
use strict;
use warnings;
use Getopt::Long;
# get the options
my %switches = ();
GetOptions (
"GCString"=>\$switches{GCString},
);
# conditionally load the GCString module
eval "use Unicode::GCString" if $switches{GCString};
if ($switches{GCString}){
print "GCString *is* loaded\n";
} else {
print "GCString is *not* loaded\n";
}
exit (0);
I can confirm that
perl test-gcstring-load.pl
runs without errors, printing
GCString is *not* loaded
However,
perl test-gcstring-load.pl --GCString
also runs without errors, printing
GCString *is* loaded
while when I put
Unicode::GCString;
on the top of the string I get as error
Can't locate Unicode/GCString.pm in @INC (you may need to install the Unicode::GCString module)
Furthermore,
use if 0, "Unicode::GCString"; # -> runs without error
use if 1, "Unicode::GCString"; # -> throws with "Can't locate Unicode/GCString.pm in @INC"
use if $switches{GCString}, "Unicode::GCString"; # -> runs without errors regardless of the command line argument (even though the correct string is printed)
When I try to run latexindent.pl
from the feature/no-GCString
branch
/my/path/latexindent.pl -l -m -s -w main.tex
runs successfully, while
/my/path/latexindent.pl -l -m -s -w --GCString main.tex
throws with
Can't locate object method "new" via package "Unicode::GCString" (perhaps you forgot to load "Unicode::GCString"?) at /my/path/latexindent.pl/LatexIndent/AlignmentAtAmpersand.pm line 1595
So I guess that all is good there!
Great, thanks for your time on this!
I think that everything you report is as expected (which isn't what I originally thought).
The eval
statement is called at 'run' time, I think, whereas the use
statement is called at 'compile' time.
The script can continue even if the Eval statement fails.
The script can not continue if the use statement fails.
I'll get this documented and hopefully merged and then released :)
Awesome! Looking forward to it!
This is implemented and documented as of https://github.com/cmhughes/latexindent.pl/releases/tag/V3.17
This should mean that you no longer need the feature/no-GCString
branch. Let me know once you no longer need it, and I'll delete it. Thanks for your help!
Works like a charm, thanks a million 🙏
Great to hear. I'll plan to delete the feature branch.
On Fri, 25 Mar 2022, 17:13 Tom de Geus, @.***> wrote:
Works like a charm, thanks a million 🙏
— Reply to this email directly, view it on GitHub https://github.com/cmhughes/latexindent.pl/issues/303#issuecomment-1079232854, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQ7CYAGFGWGLWBZ7OMD4OLVBXX3JANCNFSM5H6KUDFA . You are receiving this because you modified the open/close state.Message ID: @.***>
feature/no-GCString
deleted :)
I have been preparing a conda feedstock for this library : https://github.com/conda-forge/staged-recipes/pull/16914
A problem that appeared, and that might persist for a while, stems from the fact that Unicode::GCString wraps a compiled library (sombok). Now the issue is that there is currently no way yes to ship the binary for mac's ARM processor (at least not on conda-forge as it currently relies on cross-compilation as Azure does not have (sufficient) hardware yet).
So now the question: is there a way to run without Unicode::GCString (e.g. in a "light" mode) ? Or is there a way to run without the compile sombok library ?