Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.9k stars 540 forks source link

\X and starter-starter combinations #9995

Closed p5pRT closed 14 years ago

p5pRT commented 14 years ago

Migrated from rt.perl.org#70940 (status was 'resolved')

Searchable as RT70940$

p5pRT commented 14 years ago

From @ikegami

Hello\,

Unicode considers certain sequences of codepoints to represent a single "visual character" (my term). For example\, the two characters "e + combinining acute" are considered the same as the single character "e acute". It is \X's intention to match these visual characters\, but it only matches some of them. It misses combinations which aren't made using combining characters\, but are nonetheless considered to be combining by Unicode.

e-acute​:

$ perl -MUnicode​::Normalize -e'printf "%04X\n"\, ord for split //\, NFD "\x{00E9}"' 0065 0301

$ perl -MUnicode​::Normalize -e'printf "%04X\n"\, ord for split //\, NFC "\x{0065}\x{0301}"' 00E9

$ perl -le'print "\x{0065}\x{0301}" =~ /^\X$/ ||0' 1

Some Korean character​:

$ perl -MUnicode​::Normalize -e'printf "%04X\n"\, ord for split //\, NFD "\x{AC00}"' 1100 1161

$ perl -MUnicode​::Normalize -e'printf "%04X\n"\, ord for split //\, NFC "\x{1100}\x{1161}"' AC00

$ perl -le'print "\x{1100}\x{1161}" =~ /^\X$/ ||0' 0

I came across this while working on a perlio wrapper for Unicode​::Normalize (which is limbo until I fix limitations in PerlIO​::via). I was asked to submit this bug report since it relates to a discussion in progress about \X.

- ELB

p5pRT commented 14 years ago

From @khwilliamson

This patch is available at git​://github.com/khwilliamson/perl.git branch x

It expands the definition of \X to more closely have its intended effect​: to match a logical Unicode character. It will now match the Unicode "extended grapheme cluster"\, as previously discussed on this list.

This entailed changing the algorithm for matching \X in regexec.c\, with additional swashes needed for the various Unicode matching tables required\, so utf8.c was changed to include those. (I'm taking advantage of this patch to add some commented-out proof-of-concept code to it that I forgot to put in the mktables revamp patch. Also\, I've cleaned up some comments in utf8.h)

The new \X passes the extensive Unicode test suite. I have modified mktables to add those tests to the test script it generates. I also extended mktables to be able to more generally handle Unicode test suites for future expansion.

I found that a comment was inappropriately not being output into a few of the mktables generated tables. And I cleaned up the testing for non-ASCII machines.

I'm not sure about perl_clone(). I don't know how it ties in with things; I added the new swash variables to it\, as documented in intrpvar.h\, but I don't know how to test that I didn't do a typo in them.

Things are set up so that when Perl is used on an earlier Unicode release that doesn't have extended grapheme clusters\, the previous definition of \X is used\, possibly with the addition of Korean Hangul syllable matching\, if available in that release.

If this patch is accepted\, pod changes will follow\, but I'm working on other bug fixes at the moment\, which have priority.

p5pRT commented 14 years ago

The RT System itself - Status changed from 'new' to 'open'

p5pRT commented 14 years ago

From @rgarcia

I would suggest\, considering the timeline\, to postpone that change for 5.13.

2009/12/6 karl williamson \public@​khwilliamson\.com​:

This patch is available at git​://github.com/khwilliamson/perl.git branch x

It expands the definition of \X to more closely have its intended effect​: to match a logical Unicode character.  It will now match the Unicode "extended grapheme cluster"\, as previously discussed on this list.

This entailed changing the algorithm for matching \X in regexec.c\, with additional swashes needed for the various Unicode matching tables required\, so utf8.c was changed to include those.  (I'm taking advantage of this patch to add some commented-out proof-of-concept code to it that I forgot to put in the mktables revamp patch.  Also\, I've cleaned up some comments in utf8.h)

The new \X passes the extensive Unicode test suite.  I have modified mktables to add those tests to the test script it generates.  I also extended mktables to be able to more generally handle Unicode test suites for future expansion.

I found that a comment was inappropriately not being output into a few of the mktables generated tables.  And I cleaned up the testing for non-ASCII machines.

I'm not sure about perl_clone().  I don't know how it ties in with things; I added the new swash variables to it\, as documented in intrpvar.h\, but I don't know how to test that I didn't do a typo in them.

Things are set up so that when Perl is used on an earlier Unicode release that doesn't have extended grapheme clusters\, the previous definition of \X is used\, possibly with the addition of Korean Hangul syllable matching\, if available in that release.

If this patch is accepted\, pod changes will follow\, but I'm working on other bug fixes at the moment\, which have priority.

-- "You don't mean odds and ends\, you mean des curieux et des bouts"\, corrected the manager. -- Terry Pratchett\, Hogfather

p5pRT commented 14 years ago

From @khwilliamson

Rafael Garcia-Suarez wrote​:

I would suggest\, considering the timeline\, to postpone that change for 5.13.

2009/12/6 karl williamson \public@​khwilliamson\.com​:

This patch is available at git​://github.com/khwilliamson/perl.git branch x

It expands the definition of \X to more closely have its intended effect​: to match a logical Unicode character. It will now match the Unicode "extended grapheme cluster"\, as previously discussed on this list.

[snip]

I would personally be disappointed to see this deferred\, as I worked very hard to get it done in what I hoped would be in time for 5.12\, and Jesse had earlier privately suggested that we could put it in rc0/5.11.3 to see if anyone complains. I forgot to mention when I submitted it that it fixes the bug that ZWJ and ZWNJ aren't in the current definition of \X and should be. That bug has been discussed on this list\, but I don't think a bug report was ever officially filed. These being missing affects more than just the Asian languages that I said this patch addressed.

That said\, another consideration is what other Unicode fixes get into 5.12. If the others I'm planning to submit in the next 54 hours get in\, then I would argue that 5.12 can be considered as a release that significantly improves the handling of Unicode\, but with possible backwards compatibility problems (hence the legacy pragma)\, and so this ought to go in as well. This does change current behavior\, although the current behavior is a bug. It is currently impossible to have an isolated mark. If you have a string which has a newline followed by a combining accent\, for example\, the current behavior is the absurd notion that the nl is accented. The new behavior is the more correct interpretation that the mark does not combine with the nl\, but is isolated. If a string begins with a combining accent\, current behavior is that \X won't match anything\, which again is wrong. Input text strings that are being processed legitimately have isolated marks when talking about the marks themselves.

On the other hand\, if the other bug fixes don't get shipped\, then this release isn't one that fixes a lot of Unicode problems\, and this is just one more not fixed.

The other bugs I'm working on are 1) to get rid of a misleading error message\, this could be deferred; and 2) to completely clean up the known issues with ut8ness of a string affecting its semantics. Again\, I plan to submit these patches in the next 54 hours.

p5pRT commented 14 years ago

From @obra

Karl\,

At a first pass through the patch\, it looks like it's one commit which mixes the \X functionality change with a fair number of unrelated changes which cloud the diff. In addition it doesn't look like we ended up with any tests for the new behavior.

Would it be possible to split out the non-\X changes into their own patch and to update the \X changes with a reasonable amount of test coverage? It looks like there are some very reasonable cleanups and code-level documentation improvements that arent' very contentious.

Given the conversation we had about the relatively small likelyhood of breakage caused by this change\, I'm still willing to consider giving it a trial run in a 5.11.x release to see how it does\, but sans-tests it's a little scary.

-Jesse

p5pRT commented 14 years ago

From @khwilliamson

jesse wrote​:

Karl\,

At a first pass through the patch\, it looks like it's one commit which mixes the \X functionality change with a fair number of unrelated changes which cloud the diff. In addition it doesn't look like we ended up with any tests for the new behavior.

Would it be possible to split out the non-\X changes into their own patch and to update the \X changes with a reasonable amount of test coverage? It looks like there are some very reasonable cleanups and code-level documentation improvements that arent' very contentious.

Given the conversation we had about the relatively small likelyhood of breakage caused by this change\, I'm still willing to consider giving it a trial run in a 5.11.x release to see how it does\, but sans-tests it's a little scary.

-Jesse

Actually\, there aren't any non-documentation unrelated portions to the patch\, as far as I can determine. I'm not sure what you mean here. I wonder if it's a problem with my push to github getting old stuff\, even though I created a new branch. I always list every non-typo\, non-comment change in my commit message.

So let's figure it out. I've gone through and looked at all the diffs\, and below talk about each one that isn't just a comment change.

But first\, the apparent lack of tests. The patch delivers the entire test suite for extended grapheme clusters\, furnished by Unicode as part of its database. It is file lib/unicore/auxiliary/GCBTest.txt. The 'Grapheme Cluster Break' property is the Unicode name for the property we want. Perhaps this file is something you thought was unrelated.

I taught mktables to understand this file's syntax\, and to not automatically ignore test files\, like it did previously. In doing so\, I added a few lines so that the other test files that are in the Unicode database would still be ignored and not generate warnings\, and I added commentary about them to the Readme. Currently\, Perl doesn't deliver those files\, but someone could download the entire db.

When you run mktables\, it generates test cases from the database. That way\, a new release of the database doesn't require any recoding on our part. In particular\, this patch caused it to read the GCB test file and generate test cases from it. This required a bunch of code in mktables.   All the generated tests are stored in lib/unicore/TestProp.pl\, and the ones for GCB are at the end of it. The test script t/re/uniprops.t is a wrapper that calls TestProp.pl. It is automatically run with a 'make test'\, and so the complete Unicode test suite for this property is run.   The only tests that mktables generates that aren't driven by the Unicode database are ones (just two right now) that explicitly are in bug reports\, including the one in perl #70940 that this patch addresses.   You can see them even without downloading and running the patch\, at the end of mktables. Test_X is the subroutine that tests \X. The Unicode GCB suite consists of about 300 lines which generates about 900 tests in all.

I did change from a string to an array the way to test this and the other tests on non-ASCII machines. I could pull that out\, but I doubt that that is what you are thinking of.

mktables also generates several new tables for use by the \X code in regexec.c. I added code so that if mktables is run on an earlier Unicode release it generates empty tables or in one release\, uses the identical but differently named tables so that regexec.c will degrade gracefully without caring what Unicode release it is running on. I added code to mktables so that these empty\, essentially placeholder\, tables in older releases won't get listed in the generated .pod file

The only thing I can think of that is not obviously related is that I discovered when I added comments for the new tables that they weren't getting generated\, so I found and fixed where that wasn't happening. It affected just one non-related table besides the new ones.

Each new table requires a one-liner function to access it and store its swash. These were added to embed.fnc\, and defined in utf8.c. I noted in my commit message that I was adding unrelated comments to utf8.[ch] that I should have added before. Some of that is commented out code that you might not notice is in a comment. It should have been included in the initial mktables patch\, as it is proof-of-concept for reading header information now included in all tables\, should we ever decide to use it.

And then there is the code in regexec.c to process the \X and to load in the swashes the first time a \X is encountered.

I also see that I moved a paragraph in the perldelta.pod.

That's all my changes. Any others I suspect are a problem with github or my usage of it.

p5pRT commented 14 years ago

From @obra

Actually\, there aren't any non-documentation unrelated portions to the patch\, as far as I can determine. I'm not sure what you mean here. I wonder if it's a problem with my push to github getting old stuff\, even though I created a new branch. I always list every non-typo\, non-comment change in my commit message.

Changes to comments and typo fixes not related to what's otherwise a functionality change should ideally be in a separate commit.

http​://github.com/khwilliamson/perl/commit/37e2e78edfe0a224b8a615820f46db879584f523 is the only commit I see on this branch from you.

Github shows me the commit message as "qr/\X/ expansion" It sounds like maybe that's part of the issue here. In general\, I'm much happier with patches sent to p5p\, so that everyone can see what's going on at runtime.

So let's figure it out. I've gone through and looked at all the diffs\, and below talk about each one that isn't just a comment change.

But first\, the apparent lack of tests. The patch delivers the entire test suite for extended grapheme clusters\, furnished by Unicode as part of its database. It is file lib/unicore/auxiliary/GCBTest.txt.

Ok. That wasn't obvious to me from reading the diff. That explains a lot about the lack of tests.

From what you've described in your message\, it sounds like there are at least four separate patches here​:

* mktables improvements * utf8.{c\,h} comment updates * adding new unihan test generation * support for \X

Does that sound right?

p5pRT commented 14 years ago

From @khwilliamson

jesse wrote​:

Actually\, there aren't any non-documentation unrelated portions to the patch\, as far as I can determine. I'm not sure what you mean here. I wonder if it's a problem with my push to github getting old stuff\, even though I created a new branch. I always list every non-typo\, non-comment change in my commit message.

Changes to comments and typo fixes not related to what's otherwise a functionality change should ideally be in a separate commit.

http​://github.com/khwilliamson/perl/commit/37e2e78edfe0a224b8a615820f46db879584f523 is the only commit I see on this branch from you.

Github shows me the commit message as "qr/\X/ expansion" It sounds like maybe that's part of the issue here. In general\, I'm much happier with patches sent to p5p\, so that everyone can see what's going on at runtime.

So let's figure it out. I've gone through and looked at all the diffs\, and below talk about each one that isn't just a comment change.

But first\, the apparent lack of tests. The patch delivers the entire test suite for extended grapheme clusters\, furnished by Unicode as part of its database. It is file lib/unicore/auxiliary/GCBTest.txt.

Ok. That wasn't obvious to me from reading the diff. That explains a lot about the lack of tests.

From what you've described in your message\, it sounds like there are at least four separate patches here​:

* mktables improvements * utf8.{c\,h} comment updates * adding new unihan test generation * support for \X

Does that sound right?

It depends on how you count. The utf8.[c h] comment updates are indeed unrelated.

unihan is short for Unicode Han-related (Chinese and similar) stuff\, and this patch has nothing to do with that.

The mktables improvements are to generate the tables needed for \X support and test cases to verify that it works. I don't see how these are really separable.

p5pRT commented 14 years ago

From @obra

Does that sound right?

It depends on how you count.

It's more that we should split that patch up into a series of commits rather than one big one. The exact number doesn't concern me too much ;)

The utf8.[c h] comment updates are indeed unrelated.

unihan is short for Unicode Han-related (Chinese and similar) stuff\, and this patch has nothing to do with that.

Apologies - unihan stuff was right around a bunch of the code you touched.

The mktables improvements are to generate the tables needed for \X support and test cases to verify that it works.

*nod*

I don't see how these are really separable.

You can't have the \X changes without the mktables stuff\, right?

My assumption was that you _could_ have the mktables changes without the new \X code\, right?

Best\, Jesse

--

p5pRT commented 14 years ago

From @khwilliamson

jesse wrote​:

Does that sound right?

It depends on how you count.

It's more that we should split that patch up into a series of commits rather than one big one. The exact number doesn't concern me too much ;)

The utf8.[c h] comment updates are indeed unrelated.

unihan is short for Unicode Han-related (Chinese and similar) stuff\, and this patch has nothing to do with that.

Apologies - unihan stuff was right around a bunch of the code you touched.

The mktables improvements are to generate the tables needed for \X support and test cases to verify that it works.

*nod*

I don't see how these are really separable.

You can't have the \X changes without the mktables stuff\, right?

My assumption was that you _could_ have the mktables changes without the new \X code\, right?

The mktables changes are a prerequisite for the regex \X changes\, but the regex \X changes are a prerequisite for the mktables tests changes (otherwise they fail).

And I don't see any advantage in splitting them. I thought the point of a single patch was to have a single place to go to add or remove everything connected with a particular feature/bug. So you can add/remove them with one fell swoop. Thus these changes seem still to me to be a logical unit.

I knew I was taking a chance with the comment changes\, but knew I could resubmit them else times if necessary\, and if it worked out\, it would save me and the committers time\, with no added risk that I could perceive.

p5pRT commented 14 years ago

From @obra

The mktables changes are a prerequisite for the regex \X changes\, but the regex \X changes are a prerequisite for the mktables tests changes (otherwise they fail).

I'm actually good with that. Tests that fail before functionality gets added and pass afterward show that the functionality does the right thing :)

I knew I was taking a chance with the comment changes\, but knew I could resubmit them else times if necessary\, and if it worked out\, it would save me and the committers time\, with no added risk that I could perceive.

*nod* If we _do_ have to pull \X\, we wouldn't want to kill the comments.
I'm happy to help with the splitting out of the different bits if you're feeling burned out by all this. I promise that I'm not trying to make life hard for you ;)

Best\, Jesse

--

p5pRT commented 14 years ago

From @khwilliamson

jesse wrote​:

The mktables changes are a prerequisite for the regex \X changes\, but the regex \X changes are a prerequisite for the mktables tests changes (otherwise they fail).

I'm actually good with that. Tests that fail before functionality gets added and pass afterward show that the functionality does the right thing :)

I knew I was taking a chance with the comment changes\, but knew I could resubmit them else times if necessary\, and if it worked out\, it would save me and the committers time\, with no added risk that I could perceive.

*nod* If we _do_ have to pull \X\, we wouldn't want to kill the comments.
I'm happy to help with the splitting out of the different bits if you're feeling burned out by all this. I promise that I'm not trying to make life hard for you ;)

Best\, Jesse

But I still haven't heard a reason that the mktables vs regex changes all in support of \X should be considered somehow separate.

With Biology taxonomists\, you have your lumpers and your splitters\, who disagree on where to draw the line on what is a species\, sub-species\, etc. Perhaps you are a splitter and I am a lumper. I am getting that you prefer these things to be split\, but I don't understand why\, and others may not as well\, so in the future I and they won't know where you want to draw the line; so I very may well not draw it where you want me to\, until I can understand and recreate your reasoning.

I have been operating under the premise that there are too few tuits for the work at hand\, so that by not creating a separate patch for comments\, but to wait until I'm working on that file anyway\, I'm saving those people who do the commits some precious time.

p5pRT commented 14 years ago

From @tonycoz

On Wed\, Dec 09\, 2009 at 10​:05​:37PM -0700\, karl williamson wrote​:

jesse wrote​:

I'm actually good with that. Tests that fail before functionality gets added and pass afterward show that the functionality does the right thing :)

But I still haven't heard a reason that the mktables vs regex changes
all in support of \X should be considered somehow separate.

I think the aim is have a patch with tests that expect the correct failure\, but fails because the implementation lacks\, and a patch that then fixes those lacks.

From a QA point of view it makes it simpler to check that your tests actually detect the problem that the implementation changes fix.

Of course\, I could be misreading jesse.

Tony

p5pRT commented 14 years ago

From @demerphq

2009/12/10 jesse \jesse@​fsck\.com​:

The mktables changes are a prerequisite for the regex \X changes\, but the regex \X changes are a prerequisite for the mktables tests changes (otherwise they fail).

I'm actually good with that. Tests that fail before functionality gets added and pass afterward show that the functionality does the right thing :)

This is a departure from our historical policy of wanting tests+patch to be together.

I think it is a good departure\, but I think saying it to someone when they *submit* a patch in the old format is not fair.

Instead id say accept this patch\, possibly with edits\, and then request in future a different scheme. And document the new scheme.

Also\, I told Karl that many people\, like me\, prefer to pull from a repo than deal with patches sent over a legacy protocol like email. (Reminds me of the original modems\, where you had to plug the phone handset into the modem...) So if any criticism is due for this aim it at me.

Incidentally​: I think we should accept both\, and from someone like Karl i feel strongly that you are doing yourself a disservice if you dont use the D part of the DVCS to your advantage.

IOW​: I think that for someone like karl its better to one off do​:

git remote add -f karl $URL_TO_KARLS_REPO

and then occasionally do a `git remote update` to see what he is up to. It is a heck of a lot less hassle than apply patches from a mailbox.

cheers\, Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 14 years ago

From @rgarcia

It appears (after days without looking at P5P) that I replied to the wrong mail here. That's what happens when you're in a hurry...

I have now merged your branch x in bleadperl\, thanks!

2009/12/7 Rafael Garcia-Suarez \rgs@​consttype\.org​:

I would suggest\, considering the timeline\, to postpone that change for 5.13.

2009/12/6 karl williamson \public@​khwilliamson\.com​:

This patch is available at git​://github.com/khwilliamson/perl.git branch x

It expands the definition of \X to more closely have its intended effect​: to match a logical Unicode character.  It will now match the Unicode "extended grapheme cluster"\, as previously discussed on this list.

This entailed changing the algorithm for matching \X in regexec.c\, with additional swashes needed for the various Unicode matching tables required\, so utf8.c was changed to include those.  (I'm taking advantage of this patch to add some commented-out proof-of-concept code to it that I forgot to put in the mktables revamp patch.  Also\, I've cleaned up some comments in utf8.h)

The new \X passes the extensive Unicode test suite.  I have modified mktables to add those tests to the test script it generates.  I also extended mktables to be able to more generally handle Unicode test suites for future expansion.

I found that a comment was inappropriately not being output into a few of the mktables generated tables.  And I cleaned up the testing for non-ASCII machines.

I'm not sure about perl_clone().  I don't know how it ties in with things; I added the new swash variables to it\, as documented in intrpvar.h\, but I don't know how to test that I didn't do a typo in them.

Things are set up so that when Perl is used on an earlier Unicode release that doesn't have extended grapheme clusters\, the previous definition of \X is used\, possibly with the addition of Korean Hangul syllable matching\, if available in that release.

If this patch is accepted\, pod changes will follow\, but I'm working on other bug fixes at the moment\, which have priority.

-- "You don't mean odds and ends\, you mean des curieux et des bouts"\, corrected the manager. -- Terry Pratchett\, Hogfather

-- "You don't mean odds and ends\, you mean des curieux et des bouts"\, corrected the manager. -- Terry Pratchett\, Hogfather

p5pRT commented 14 years ago

From @ikegami

Quickly tested. Fixed by 5b79a2243fad631bde9802bdfa8903ea90aca08d for 5.12

p5pRT commented 14 years ago

@ikegami - Status changed from 'open' to 'resolved'

p5pRT commented 14 years ago

From @andk

On Sat\, 12 Dec 2009 09​:57​:52 +0100\, Rafael Garcia-Suarez \rgs@​consttype\.org said​:

  > I have now merged your branch x in bleadperl\, thanks!

Something's wrong. When I test YAML I get lots of this sort of diagnostic output​:

YAML Error​: Error in require YAML​::Dumper - Can't load '/home/src/perl/repoperls/installed-perls/perl/v5.11.2-115-g37e2e78/lib/5.11.2/x86_64-linux-thread-multi/auto/re/re.so' for module re​: /home/src/perl/repoperls/installed-perls/perl/v5.11.2-115-g37e2e78/lib/5.11.2/x86_64-linux-thread-multi/auto/re/re.so​: undefined symbol​: is_utf8_X_V at /home/src/perl/repoperls/installed-perls/perl/v5.11.2-115-g37e2e78/lib/5.11.2/XSLoader.pm line 70. at /home/src/perl/repoperls/installed-perls/perl/v5.11.2-115-g37e2e78/lib/5.11.2/x86_64-linux-thread-multi/re.pm line 69 Compilation failed in require at /home/src/perl/repoperls/installed-perls/perl/v5.11.2-115-g37e2e78/lib/5.11.2/B/Deparse.pm line 3447. BEGIN failed--compilation aborted at /home/src/perl/repoperls/installed-perls/perl/v5.11.2-115-g37e2e78/lib/5.11.2/B/Deparse.pm line 3447. Compilation failed in require at /home/sand/.cpan/build/YAML-0.70-g3laHF/blib/lib/YAML/Types.pm line 137. BEGIN failed--compilation aborted at /home/sand/.cpan/build/YAML-0.70-g3laHF/blib/lib/YAML/Types.pm line 137. Compilation failed in require at /home/sand/.cpan/build/YAML-0.70-g3laHF/blib/lib/YAML/Dumper.pm line 8. BEGIN failed--compilation aborted at /home/sand/.cpan/build/YAML-0.70-g3laHF/blib/lib/YAML/Dumper.pm line 8. Compilation failed in require at (eval 7) line 2.

myconfig appended

-- andreas

% git describe v5.11.2-115-g37e2e78 % ./myconfig
Summary of my perl5 (revision 5 version 11 subversion 2) configuration​:  
  Platform​:   osname=linux\, osvers=2.6.30-1-amd64\, archname=x86_64-linux-thread-multi   uname='linux k81 2.6.30-1-amd64 #1 smp sat aug 15 18​:09​:19 utc 2009 x86_64 gnulinux '   config_args='-Dprefix=/home/src/perl/repoperls/installed-perls/perl/v5.11.2-115-g37e2e78 -Dinstallusrbinperl=n -Uversiononly -Dusedevel -des -Ui_db -Dusethreads'   hint=previous\, useposix=true\, d_sigaction=define   useithreads=define\, usemultiplicity=define   useperlio=define\, d_sfio=undef\, uselargefiles=define\, usesocks=undef   use64bitint=define\, use64bitall=define\, uselongdouble=undef   usemymalloc=n\, bincompat5005=undef   Compiler​:   cc='cc'\, ccflags ='-D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64'\,   optimize='-O2'\,   cppflags='-D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64'   ccversion=''\, gccversion='4.3.4'\, gccosandvers=''   intsize=4\, longsize=8\, ptrsize=8\, doublesize=8\, byteorder=12345678   d_longlong=define\, longlongsize=8\, d_longdbl=define\, longdblsize=16   ivtype='long'\, ivsize=8\, nvtype='double'\, nvsize=8\, Off_t='off_t'\, lseeksize=8   alignbytes=8\, prototype=define   Linker and Libraries​:   ld='cc'\, ldflags =' -fstack-protector -L/usr/local/lib'   libpth=/usr/local/lib /lib /usr/lib /lib64 /usr/lib64   libs=-lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lpthread -lc -lgdbm_compat   perllibs=-lnsl -ldl -lm -lcrypt -lutil -lpthread -lc   libc=/lib/libc-2.9.so\, so=so\, useshrplib=false\, libperl=libperl.a   gnulibc_version='2.9'   Dynamic Linking​:   dlsrc=dl_dlopen.xs\, dlext=so\, d_dlsymun=undef\, ccdlflags='-Wl\,-E'   cccdlflags='-fPIC'\, lddlflags='-shared -O2 -L/usr/local/lib -fstack-protector'

p5pRT commented 14 years ago

From @khwilliamson

Andreas J. Koenig wrote​:

On Sat\, 12 Dec 2009 09​:57​:52 +0100\, Rafael Garcia-Suarez \rgs@​consttype\.org said​:

I have now merged your branch x in bleadperl\, thanks!

Something's wrong. When I test YAML I get lots of this sort of diagnostic output​:

YAML Error​: Error in require YAML​::Dumper - Can't load '/home/src/perl/repoperls/installed-perls/perl/v5.11.2-115-g37e2e78/lib/5.11.2/x86_64-linux-thread-multi/auto/re/re.so' for module re​: /home/src/perl/repoperls/installed-perls/perl/v5.11.2-115-g37e2e78/lib/5.11.2/x86_64-linux-thread-multi/auto/re/re.so​: undefined symbol​: is_utf8_X_V at /home/src/perl/repoperls/installed-perls/perl/v5.11.2-115-g37e2e78/lib/5.11.2/XSLoader.pm line 70. at /home/src/perl/repoperls/installed-perls/perl/v5.11.2-115-g37e2e78/lib/5.11.2/x86_64-linux-thread-multi/re.pm line 69 Compilation failed in require at /home/src/perl/repoperls/installed-perls/perl/v5.11.2-115-g37e2e78/lib/5.11.2/B/Deparse.pm line 3447. BEGIN failed--compilation aborted at /home/src/perl/repoperls/installed-perls/perl/v5.11.2-115-g37e2e78/lib/5.11.2/B/Deparse.pm line 3447. Compilation failed in require at /home/sand/.cpan/build/YAML-0.70-g3laHF/blib/lib/YAML/Types.pm line 137. BEGIN failed--compilation aborted at /home/sand/.cpan/build/YAML-0.70-g3laHF/blib/lib/YAML/Types.pm line 137. Compilation failed in require at /home/sand/.cpan/build/YAML-0.70-g3laHF/blib/lib/YAML/Dumper.pm line 8. BEGIN failed--compilation aborted at /home/sand/.cpan/build/YAML-0.70-g3laHF/blib/lib/YAML/Dumper.pm line 8. Compilation failed in require at (eval 7) line 2.

myconfig appended

I see from the output that somehow the function is_utf8_X_V() is not defined in your build. I just checked\, and it is in embed.fnc and as Perl_is_utf8_X_V() in utf8.c in blead. I don't know how all this hangs together\, but perhaps the make regen all didn't happen.

p5pRT commented 14 years ago

From @rgarcia

2009/12/13 karl williamson \public@​khwilliamson\.com​:

Andreas J. Koenig wrote​:

On Sat\, 12 Dec 2009 09​:57​:52 +0100\, Rafael Garcia-Suarez \rgs@​consttype\.org said​:

 > I have now merged your branch x in bleadperl\, thanks!

Something's wrong. When I test YAML I get lots of this sort of diagnostic output​:

YAML Error​: Error in require YAML​::Dumper - Can't load '/home/src/perl/repoperls/installed-perls/perl/v5.11.2-115-g37e2e78/lib/5.11.2/x86_64-linux-thread-multi/auto/re/re.so' for module re​: /home/src/perl/repoperls/installed-perls/perl/v5.11.2-115-g37e2e78/lib/5.11.2/x86_64-linux-thread-multi/auto/re/re.so​: undefined symbol​: is_utf8_X_V at /home/src/perl/repoperls/installed-perls/perl/v5.11.2-115-g37e2e78/lib/5.11.2/XSLoader.pm line 70.  at /home/src/perl/repoperls/installed-perls/perl/v5.11.2-115-g37e2e78/lib/5.11.2/x86_64-linux-thread-multi/re.pm line 69 Compilation failed in require at /home/src/perl/repoperls/installed-perls/perl/v5.11.2-115-g37e2e78/lib/5.11.2/B/Deparse.pm line 3447. BEGIN failed--compilation aborted at /home/src/perl/repoperls/installed-perls/perl/v5.11.2-115-g37e2e78/lib/5.11.2/B/Deparse.pm line 3447. Compilation failed in require at /home/sand/.cpan/build/YAML-0.70-g3laHF/blib/lib/YAML/Types.pm line 137. BEGIN failed--compilation aborted at /home/sand/.cpan/build/YAML-0.70-g3laHF/blib/lib/YAML/Types.pm line 137. Compilation failed in require at /home/sand/.cpan/build/YAML-0.70-g3laHF/blib/lib/YAML/Dumper.pm line 8. BEGIN failed--compilation aborted at /home/sand/.cpan/build/YAML-0.70-g3laHF/blib/lib/YAML/Dumper.pm line 8. Compilation failed in require at (eval 7) line 2.

myconfig appended

I see from the output that somehow the function is_utf8_X_V() is not defined in your build.  I just checked\, and it is in embed.fnc and as Perl_is_utf8_X_V() in utf8.c in blead.  I don't know how all this hangs together\, but perhaps the make regen all didn't happen.

I just fixed that with 5270d770bca421d596c01c6a2ac5038bb781ff93

p5pRT commented 14 years ago

From @khwilliamson

karl williamson wrote​:

Andreas J. Koenig wrote​:

On Sat\, 12 Dec 2009 09​:57​:52 +0100\, Rafael Garcia-Suarez \rgs@​consttype\.org said​:

I have now merged your branch x in bleadperl\, thanks!

Something's wrong. When I test YAML I get lots of this sort of diagnostic output​:

YAML Error​: Error in require YAML​::Dumper - Can't load '/home/src/perl/repoperls/installed-perls/perl/v5.11.2-115-g37e2e78/lib/5.11.2/x86_64-linux-thread-multi/auto/re/re.so' for module re​: /home/src/perl/repoperls/installed-perls/perl/v5.11.2-115-g37e2e78/lib/5.11.2/x86_64-linux-thread-multi/auto/re/re.so​: undefined symbol​: is_utf8_X_V at /home/src/perl/repoperls/installed-perls/perl/v5.11.2-115-g37e2e78/lib/5.11.2/XSLoader.pm line 70. at /home/src/perl/repoperls/installed-perls/perl/v5.11.2-115-g37e2e78/lib/5.11.2/x86_64-linux-thread-multi/re.pm line 69 Compilation failed in require at /home/src/perl/repoperls/installed-perls/perl/v5.11.2-115-g37e2e78/lib/5.11.2/B/Deparse.pm line 3447. BEGIN failed--compilation aborted at /home/src/perl/repoperls/installed-perls/perl/v5.11.2-115-g37e2e78/lib/5.11.2/B/Deparse.pm line 3447. Compilation failed in require at /home/sand/.cpan/build/YAML-0.70-g3laHF/blib/lib/YAML/Types.pm line 137. BEGIN failed--compilation aborted at /home/sand/.cpan/build/YAML-0.70-g3laHF/blib/lib/YAML/Types.pm line 137. Compilation failed in require at /home/sand/.cpan/build/YAML-0.70-g3laHF/blib/lib/YAML/Dumper.pm line 8. BEGIN failed--compilation aborted at /home/sand/.cpan/build/YAML-0.70-g3laHF/blib/lib/YAML/Dumper.pm line 8. Compilation failed in require at (eval 7) line 2.

myconfig appended

I see from the output that somehow the function is_utf8_X_V() is not defined in your build. I just checked\, and it is in embed.fnc and as Perl_is_utf8_X_V() in utf8.c in blead. I don't know how all this hangs together\, but perhaps the make regen all didn't happen.

And having just written that\, I did a pull on blead and found that probably Rafael had already fixed this.

It's clear that I don't know how all this hangs together\, nor why it worked previously for me but not you . The comment in embed.fnc says the 'A' flag is for making the function a member of the public API. But I didn't see any reason to document these functions for general use. They are very specialized for a single regex op. I would appreciate someone who knows what's going on to change that comment.

p5pRT commented 14 years ago

From @rgarcia

2009/12/13 karl williamson \public@​khwilliamson\.com​:

It's clear that I don't know how all this hangs together\, nor why it worked previously for me but not you .  The comment in embed.fnc says the 'A' flag is for making the function a member of the public API.  But I didn't see any reason to document these functions for general use. They are very specialized for a single regex op.  I would appreciate someone who knows what's going on to change that comment.

That's because the regexp engine can be compiled separately (re.so -- that's how C\<use re "debug"> is implemented)\, but still needs to access those functions.

p5pRT commented 14 years ago

From @ap

* jesse \jesse@&#8203;fsck\.com [2009-12-10 05​:35]​:

the regex \X changes are a prerequisite for the mktables tests changes (otherwise they fail).

I'm actually good with that. Tests that fail before functionality gets added and pass afterward show that the functionality does the right thing :)

Do keep in mind how this will impact git-bisect and suchlike.

If this is going to be policy\, you probably want to require that the tests be TODO’d in the first commit and armed in the second.

Regards\, -- Aristotle Pagaltzis // \<http​://plasmasturm.org/>