Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.85k stars 527 forks source link

[PATCH][docs]Typeglobs: add documentation #5256

Closed p5pRT closed 20 years ago

p5pRT commented 22 years ago

Migrated from rt.perl.org#8839 (status was 'resolved')

Searchable as RT8839$

p5pRT commented 22 years ago

From tagunov@motor.ru

This is a bug report for perl from "Anton Tagunov" \tagunov@​motor\.ru generated with the help of perlbug 1.33 running under perl v5.7.3.


Hello\, developers! Very few time has passed since I started learning Perl and the docs have helped me well to get up to speed with a new language so far. :-)

But there was one subject that remained completely mysterious to me​: the Typeglobs. I was completely stuck at understanding them utill

sthoenna@​efn.org (Yitzchak Scott-Thoennes) (bs"d) and Mark-Jason Dominus \mjd@​plover\.com

have really helped me out :-)

I looked at what I got to understand and it seemed to me that the docs sort could probably have more on Typeglobs. Not that this is something so much inevitable nowadays\, that we have hard references and open my $fh\, but still I believe that's some code out there in reality that uses them and newcomers will probably have to understand and debug it. Moreover understanding the Exporter and alike is hardly possible without a clear picture of how the Typeglob machinery works. And finally its just nice to understand how that all works\, at least for historical reasons :-)

The next day I got there mails from Yitzchak and Mark-Jason I tried to rework my understanding into a possibly solid piece of documentation. Here it is! I believe the material is quite raw\, probably I should have written typeglob instead of Typeglob in the middle of senctences. I always tend to be too wordy when I get to explaining things\, so maybe this may be abridged\, maybe some parts are surplus by themselves\, any remarks welcome! The general idea is to make this a fuel for a better description of Typeglobs.

I see a place for this in perlmod.pod just before

=head2 Package Constructors and Destructors

but have no reasons to object against putting it somewhere else. (To a separate perlglobtut.pod?)

With best regards and warmest wishes\, - Anton Tagunov


Don't be mislead by C\<print(*main​::foo)> printing C\<'*main​::foo'> and C\<*main​::foo eq 'main​::foo'> evaluating to C\<1>. It's best to imagine C\<*main​::foo> as an instance of special complex datatype. Let us use C\ syntax​:

  TYPE Typeglob = RECORD   SCALAR\, ARRAY\, HASH\, CODE\, IO : REFERENCE;   PACKAGE\, NAME : STRING /*immutable*/;   END;

Thy following almost equivalent expressions serve as variables strictly typed to Typeglob​:

  *main​::foo

  $main​::{foo}
  $main​::{'foo'}

(The first one is always of the Typeglob type\, and the later may be either L\ or be of the Typeglob type.) Typeglob assignments are always done by reference and never by value.

As Perl code is compiled and identifiers encourted the C\<\< %\<package-name>​:: >> hash is filled with (references to) newly created Typeglobs. The same identifier in the same package never causes a Typeglob to be created twice.

  sub foo   $foo %foo @​foo &foo *foo foo

all trigger Typeglob creation\, while

  $main​::{foo}   $main​::{'foo'}

don't.

Typeglobs form a separate scalar data type in Perl and are allowed everywhere regular scalars are allowed. They may become values of scalar variables\, members of complex data structures\, be passed in and out of functions (as parameters and results)\, be assigned to each other and back to the C\<*main​::foo>/C\<$main​::{foo}> expressions (equivalent expressions are grouped together)​:

  *main​::foo = *main​::bar ;   *main​::foo = $main​::{bar} ;   $main​::{foo} = *main​::bar ;   $main​::{foo} = $main​::{bar} ;

  $a = *main​::foo ;   $a = $main​::{foo} ;  
  $c = [ 0\, 1\, *other​::baz ];

  &examine( *other​::baz ) ;   &examine( $other​::{'baz'} ) ;

=for comment Are there any ways but the natural one and localizing *foo to create Typeglob objects? Probably no\, spell it out here?

All Typeglob assignments are done I\. This means that after any assignment of a Typeglob value both the source and the destination share the same Typeglob object. Same holds true if a typeglob is being passed from/into a function.

If you have a expression that evaluates to a Typeglob value you may access its slots via the following notation​:

  ${expr} # SCALAR   %{expr} # HASH   @​{expr} # ARRAY   $#{expr # ARRAY\, access the last element index   &{expr} # CODE   expr # IO handle

The limitations imposed over the expressions are the same as those imposed over expressions evaluating to hard references to be dereferencable via S\<C\<$\, %\, @​\, $#\, &>\,> see L\<prelref|perlref>.

  # provided that the 'main​::foo' identifier has been seen by the   # compiler till the current moment *main​::foo and $main​::{foo}   # are strictly equivalent (modulo a slight the performance)   #   # that's why expressions in each of the following pairs are   # equivalent   #   ${*main​::foo} ${$main​::{foo}} # same as $foo   %{*main​::foo} %{$main​::{foo}} # same as %foo   @​{*main​::foo} @​{$main​::{foo}} # same as @​foo   &{*main​::foo} &{$main​::{foo}} # same as &foo   readline(*main​::foo) readline($main​::{foo}) # readline(foo)

  # an example with a complex data structure   $main​::foo='foo';   @​other​::bar=(0\,1\,2\,3);   $outer{OU} = [ 14\,   {   IN => *main​::foo   }\,   $other​::{bar}   ];   print ${$outer{OU}[1]{IN}}\, $#{$outer{OU}[2]}; #prints foo3

  #an example of passing Typeglobs in/out of a function   our ($foo\,$bar)=('foo'\,'bar');   sub a{ (*bar\,${shift()}) }   my @​a=&a(*foo);   print ${$a[0]}\, $a[1];

Another notation related to typeglobs is

  *foo{CODE} # equivalent to \&foo   *foo{SCALAR} # equivalent to \$foo   *foo{HASH} # equivalent to \%foo   *foo{ARRAY} # equivalent to \@​foo   *foo{CODE} # equivalent to \&foo   *foo{IO} # equivalent to \foo

it allowes to obtain hard references to slots in C\<*main​::foo> Typeglob variable. Unlike the previous one this notation does works only with a literal C\<*foo> and is not applicable to general expressions evaluating to Typeglobs. You can partially bypass this limitation and obtain references to the slots of a Typeglob by doing​:

  # let's assume my $v=\*foo;   \${expr} # then \$$v gets \$foo   \%{expr} # then \$%v gets \%foo   \@​{expr} # then \$@​v gets \@​foo   \&{expr} # then \$&v gets \&foo

but there's no workaround to obtain a reference to the C\ slot (C\<\foo> in our example).

Moreover\, the C\<*foo{IO}> notation has issues covered/to be covered in L\ and L\ and it's usage is disadviced in favour of L\<open(my $foo)|perlfunc/open>-based approach.

  local *foo;

has the effect of localizing C\<*main​::foo>\, the C\<$main​::{foo}>'s optimized shorthand to. A new temporary Typeglob is created and assigned to C\<*main​::foo==$main{foo}>. All previously made assignments are not affected (see the samples bellow). Because assiments from C\<*main​::foo==$main​::{foo}> operate on references\, doing something like C\<$save=*main​::foo> or C\<return *main​::{foo}> will save a reference to the temporary Typeglob. This will keep it (and its slots) accessible via the saved reference after the original value of C\<*main​::foo==$main{foo}> is restored in the same way hard reference to lexically/dynamically scoped variables make these variables outlive their scope. If you're curious you may track the Typeglobs' C\ in the regular way with L\<Devel​::Peek|Devel​::Peek> same way you do it for scalars\, arrays\, hashes\, etc.

Returning a temporary Typeglob from a function is used by the infamous C\<{local FH; open FH\, 'zzz'; return *FH;}>. (Please refer to L\ for the modern way of returning a reference to a file handle from function.)

C\<\*main​::foo> is equivalent to C\<\$main​::{foo}>\, for it's is worth\, and operates just as one would expect​:

  $r = \*foo;   $${\*foo} # access $foo   $$$r # access $foo   %$$r # access %foo   # compare​:   $rr = \$foo;   $$rr # access $foo

As C\<\*main​::foo> is refering a value in the C\<%main​::> cache it sees the localized version if C\<*main​::foo> has been localized.

C\<*main​::foo> evaluates to C\<'*main​::foo'> when being C\-ed or used in string operations\, to C\<0> in mathematical. C\<*main​::foo> is not a string and conversion to string is irreversible​: C\<${'*main​::foo'}> is just a funny way to write C\<${main​::foo}>\, a symbolic reference (using it naturally fails under C\<use strict 'refs'>). Typeglobs are no more use then strong references as hash keys​: C\<*main​::foo> gets converted C\<'*main​::foo'> and the later is used as the cache key. A couple more examples​:

  my $a=*​::foo; # save (the reference to) the   # current value of $​::{foo}   $​::foo='global';   print $$a\,"\n"; # prints 'global'   {   local *foo;
  $​::foo='local';   print $$a\,"\n"; # prints 'global'   }

  my $a=\*​::foo; # get a reference to $​::{foo}   $​::foo='global';   print $$$a\,"\n"; # prints 'global'   {   local *foo;
  $​::foo='local';   print $$$a\,"\n"; # prints 'local'   }

  # watch the Typeglob destruction\, prints 0 1 D 2   package Foo;   sub new { bless {}\,shift; }   sub DESTROY { print 'D '; }   package main; our $foo;   my $a=*foo; # save (the reference to) the initial   # typeglob for FOO\, now it has two   # references to it​: $​::{FOO} and $a   $foo=Foo->new();   print '0 ';   *foo=*boo; # one reference to that left\, $a   print '1 ';
  undef $a; # no more references to that typeglob   # remain\, so now it will be destroyed   # together with the object\, 'D' is   # printed   print '2 ';



Flags​:   category=docs   severity=low


Site configuration information for perl v5.7.3​:

Configured by anthony at Mon Mar 11 18​:43​:11 2002.

Summary of my perl5 (revision 5 undef) configuration​:   Platform​:   osname=MSWin32\, osvers=4.0\, archname=MSWin32-x86-multi-thread   uname=''   config_args='undef'   hint=recommended\, useposix=true\, d_sigaction=undef   usethreads=undef use5005threads=undef useithreads=define usemultiplicity=define   useperlio=define d_sfio=undef uselargefiles=undef usesocks=undef   use64bitint=undef use64bitall=undef uselongdouble=undef   usemymalloc=n\, bincompat5005=undef   Compiler​:   cc='cl'\, ccflags ='-nologo -Gf -W3 -O1 -MD -DNDEBUG -DWIN32 -D_CONSOLE -DNO_STRICT -DPERL_IMPLICIT_CONTEXT -DPERL_IMPLICIT_SYS -DUSE_PERLIO -DPERL_MSVCRT_READFIX'\,   optimize='-O1 -MD -DNDEBUG'\,   cppflags='-DWIN32'   ccversion='undef'\, gccversion=''\, gccosandvers='undef'   intsize=4\, longsize=4\, ptrsize=4\, doublesize=8\, byteorder=1234   d_longlong=undef\, longlongsize=8\, d_longdbl=define\, longdblsize=10   ivtype='long'\, ivsize=4\, nvtype='double'\, nvsize=8\, Off_t='off_t'\, lseeksize=4   alignbytes=8\, prototype=define   Linker and Libraries​:   ld='link'\, ldflags ='-nologo -nodefaultlib -release -libpath​:"c​:\perl15173\lib\CORE" -machine​:x86'   libpth=e​:\apps\ds40\VC\lib   libs= oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib netapi32.lib uuid.lib wsock32.lib mpr.lib winmm.lib version.lib odbc32.lib odbccp32.lib msvcrt.lib   perllibs=undef   libc=msvcrt.lib\, so=dll\, useshrplib=yes\, libperl=perl57.lib   Dynamic Linking​:   dlsrc=dl_win32.xs\, dlext=dll\, d_dlsymun=undef\, ccdlflags=' '   cccdlflags=' '\, lddlflags='-dll -nologo -nodefaultlib -release -libpath​:"c​:\perl15173\lib\CORE" -machine​:x86'

Locally applied patches​:   DEVEL15172


@​INC for perl v5.7.3​:   c​:/perl15173/lib   c​:/perl15173/site/lib   .


Environment for perl v5.7.3​:   HOME=C​:\   LANG (unset)   LANGUAGE (unset)   LC_ALL=EN_US   LD_LIBRARY_PATH (unset)   LOGDIR (unset)   PATH=e​:\apps\ds40\SharedIDE\BIN;e​:\apps\ds40\VC\BIN;e​:\apps\ds40\VC\BIN\WINNT;E​:\apps\ibm\vaj\eab\bin;C​:\usr\local\bin\;e​:\Program Files\ibm\gsk5\lib;E​:\APPS\ROSE\RATION~1\NUTCROOT\bin;E​:\APPS\ROSE\RATION~1\NUTCROOT\bin\x11;E​:\APPS\ROSE\RATION~1\NUTCROOT\mksnt;e​:\java\sun\java131\bin;e​:\apps\vbroker\jre\Bin;e​:\apps\vbroker\Bin;C​:\WINNT\system32;C​:\WINNT;c​:\util;E​:\apps\CacheSys\Bin;C​:\Program Files\rksupport;C​:\WINNT\ton\bin;E​:\apps\rose\common;E​:\apps\rose\Rational Test;E​:\apps\borland\delphi\Bin;E​:\apps\borland\delphi\Projects\Bpl;E​:\apps\ibm\IBM\IMNNQ;E​:\apps\ibm\db2p\BIN;E​:\apps\ibm\db2p\FUNCTION;E​:\apps\ibm\db2p\SAMPLES\REPL;E​:\apps\ibm\db2p\HELP;e​:\apps\ibm\websphere\bin;G​:\MSVC50\VC\BIN;G​:\MSVC50\VC\BIN\WINNT   PERL_BADLANG (unset)   SHELL (unset)

p5pRT commented 22 years ago

From @ysth

In article \10048932891\.20020314012525@&#8203;motor\.ru\, Anton Tagunov \tagunov@&#8203;motor\.ru wrote​:

I see a place for this in perlmod.pod just before

=head2 Package Constructors and Destructors

but have no reasons to object against putting it somewhere else. (To a separate perlglobtut.pod?)

I have no opinion on where best place for this is.

With best regards and warmest wishes\, - Anton Tagunov

--------------------------------------------------------

Don't be mislead by C\<print(*main​::foo)> printing C\<'*main​::foo'> and C\<*main​::foo eq 'main​::foo'> evaluating to C\<1>.

'*main​::foo'

It's best to imagine C\<*main​::foo> as an instance of special complex datatype. Let us use C\ syntax​:

No\, please. Seems to me you can describe a record without resorting to another language.

TYPE Typeglob = RECORD SCALAR\, ARRAY\, HASH\, CODE\, IO : REFERENCE; PACKAGE\, NAME : STRING /*immutable*/; END;

Thy following almost equivalent expressions serve as variables strictly typed to Typeglob​:

Saying Typeglob makes me think it is a special package name (like ref(qr//) eq 'Regexp'). How about all lowercase typeglob.

*main​::foo

$main​::{foo}
$main​::{'foo'}

(The first one is always of the Typeglob type\, and the later may be either L\ or be of the Typeglob type.) Typeglob assignments are always done by reference and never by value.

Not sure what you mean by 'by reference'. Not sure what you mean by 'Typeglob assignments'. Only thing special I know of is assigning a reference into a typeglob (i.e. *FOO = \$x) only assigns into the slot for the reference's type. Otherwise it is a normal scalar assignment\, whether there is a typeglob on the left\, right\, or both\, which copies any fields available in the right value (IV/UV\, NV\, PV\, GP\, RV\, ... ) to the extent possible.

As Perl code is compiled and identifiers encourted the C\<\< %\<package-name>​:: >> hash is filled with (references to) newly created Typeglobs. The same identifier in the same package never causes a Typeglob to be created twice.

sub foo $foo %foo @​foo &foo *foo foo

all trigger Typeglob creation\, while

$main​::{foo} $main​::{'foo'}

don't.

Typeglobs form a separate scalar data type in Perl and are allowed everywhere regular scalars are allowed. They may become values of scalar variables\, members of complex data structures\, be passed in and out of functions (as parameters and results)\, be assigned to each other and back to the C\<*main​::foo>/C\<$main​::{foo}> expressions (equivalent expressions are grouped together)​:

*main​::foo = *main​::bar ; *main​::foo = $main​::{bar} ; $main​::{foo} = *main​::bar ; $main​::{foo} = $main​::{bar} ;

$a = *main​::foo ; $a = $main​::{foo} ;

$c = [ 0\, 1\, *other​::baz ];

&examine( *other​::baz ) ; &examine( $other​::{'baz'} ) ;

=for comment Are there any ways but the natural one and localizing

Do you mean 'of localizing'?

*foo to create Typeglob objects? Probably no\, spell it out here?

Symbol​::gensym returns a reference to a newly created typeglob that is not connected to any symbol table.

All Typeglob assignments are done I\. This means that after any assignment of a Typeglob value both the source and the destination share the same Typeglob object. Same holds true if a typeglob is being passed from/into a function.

Now I know what you meant 'by reference'. You need to clarify 'typeglob assignments' as assigning a typeglob to a scalar (which may or may not be another typeglob).

If you have a expression that evaluates to a Typeglob value you may access its slots via the following notation​:

${expr} # SCALAR %{expr} # HASH @​{expr} # ARRAY $#{expr # ARRAY\, access the last element index &{expr} # CODE expr # IO handle

That last is a bareword. In the presence of a * prototype it is silently changed to *expr\, which is a typeglob. There is no way to access just the IO handle but *FOO{IO}.

The limitations imposed over the expressions are the same as those imposed over expressions evaluating to hard references to be dereferencable via S\<C\<$\, %\, @​\, $#\, &>\,> see L\<prelref|perlref>.

If you mean you can say $x = *FOO; %$x\, @​$x\, $$x\, &$x but for more complex expression in place of $x you need curlies around it\, you should just say so. The reader may not be able to distinguish what you are referring to in perlref.

# provided that the 'main​::foo' identifier has been seen by the # compiler till the current moment *main​::foo and $main​::{foo}

s/till the current moment/before/

# are strictly equivalent (modulo a slight the performance)

?? Did you mean 'a slight performance difference'?

# # that's why expressions in each of the following pairs are # equivalent # ${*main​::foo} ${$main​::{foo}} # same as $foo %{*main​::foo} %{$main​::{foo}} # same as %foo @​{*main​::foo} @​{$main​::{foo}} # same as @​foo &{*main​::foo} &{$main​::{foo}} # same as &foo readline(*main​::foo) readline($main​::{foo}) # readline(foo)

Again\, no way to get the just IO handle. That's the whole glob there. I'd omit the readline line.

# an example with a complex data structure $main​::foo='foo'; @​other​::bar=(0\,1\,2\,3); $outer{OU} = [ 14\, { IN => *main​::foo }\, $other​::{bar} ]; print ${$outer{OU}[1]{IN}}\, $#{$outer{OU}[2]}; #prints foo3

#an example of passing Typeglobs in/out of a function our ($foo\,$bar)=('foo'\,'bar'); sub a{ (*bar\,${shift()}) } my @​a=&a(*foo); print ${$a[0]}\, $a[1];

Another notation related to typeglobs is

*foo{CODE} # equivalent to \&foo *foo{SCALAR} # equivalent to \$foo *foo{HASH} # equivalent to \%foo *foo{ARRAY} # equivalent to \@​foo *foo{CODE} # equivalent to \&foo *foo{IO} # equivalent to \foo

Nope. \foo is a reference to the return value(s) of foo() if a sub foo is defined. Otherwise it is a reference to the string "foo" (with a warning).

it allowes to obtain hard references to slots in C\<*main​::foo> Typeglob variable. Unlike the previous one this notation does works only with a literal C\<*foo> and is not applicable to general expressions evaluating to Typeglobs. You can partially bypass this limitation and obtain references to the slots of a Typeglob by doing​:

                 \# let's assume my $v=\\\*foo;

\${expr} # then \$$v gets \$foo \%{expr} # then \$%v gets \%foo \@​{expr} # then \$@​v gets \@​foo \&{expr} # then \$&v gets \&foo

but there's no workaround to obtain a reference to the C\ slot (C\<\foo> in our example).

Nope. You can say *$x{IO} or *{get_a_typeglob()}{IO} or *{$x->{typglb}}{IO} No workaround needed.

That's as far as I can read right now. I may comment on the rest later. A couple closing comments​: you mention the *main​::foo \<-> $main​::{foo} equivalence repeatedly. Why not omit all that and start the document by saying (only much more fleshed out)​:

Packages are just symbol tables. Symbol tables are hashes. Values in symbol table hashes are called typeglobs. You can access typeglobs either through the symbol table hash( C\< $main​::{foo} > ) or directly\, with a '*' prefix (C \< *main​::foo >). (In the latter case\, the entry is created at compile time. In the former\, it will not exist unless a $main​::foo\, sub main​::foo\, etc. exists.) typeglobs have multiple slots​: SCALAR\, CODE\, etc. These are normally accessed by $foo\, &foo\, etc. but if you have a typeglob t you can say *{t}{SCALAR} *{t}{CODE} etc.\, which is equivalent. (The {} around t are optional if it is a simple scalar.) You can assign typeglobs to scalars and pass them to subs or return them from subs just like any other scalar.

So far I haven't seen anything that mentions the curious fact that you can use typeglobs and refs to typeglobs pretty interchangably.