Open jhh67 opened 2 years ago
I looked into this a bit, and the conflict that's occurring is between the module named borrowed
that's created from the filename borrowed.chpl
(that was the easy part) and an internal type named borrowed
that we introduce within the compiler to represent the borrowed
classes:
dtBorrowed = createInternalType("borrowed", "borrowed");
While I determined as John mentions in the OP that if
didn't have a similar error, that's because we don't create internal symbols named "if". We do make a number of other internal symbols—particularly around type—that will have the same problem as in the OP. E.g., a file named int.chpl
has the same issue for the same reason.
So I think possible paths forward are:
_
or chpl_
onto such types.I'd honestly prefer if it were not possible to use any reserved words as file names. I think this is the simplest and most straight-forward approach to take. We already emit errors when trying to redefine reserved word/type names in e.g. a procedure, and so doing the same for modules (whether implicit or explicit) seems consistent to me.
@dlongnecke-cray: Would you disallow a file named string.chpl
whose contents were:
module M {
...
}
(i.e., that did not introduce an implicit module named string
?).
Wait that's possible? I can never keep track of the rules...How do you get at the contents of string.chpl
if you can't say use string
? And if you say use string
, doesn't that imply there's a module named string
? Or in this case does it have to be a command line module (e.g., you can only get at it via chpl string.chpl
, but beyond that point, it's inaccessible...)?
Wait that's possible? I can never keep track of the rules... A filename is only used as the module name if the file contains top-level code that is not within a module declaration itself. And arguably, a best practice for writing super-safe code (i.e., not a quick sketch) would be to not rely on the filename-based module name so that the behavior of the code is independent of the file you happen to be storing it in. My general philosophy is that language behavior and semantics shouldn't be too tied to a filename so that you can do things like pipe a file into a compiler or tool, paste it into a web compiler, whatever without a behavior change. I still think the "implicit module name based on filename" feature is important, though, from a convenience perspective (e.g., writing a one-line hello world) and so that all code is defined in some module.
How do you get at the contents of string.chpl if you can't say use string? And if you say use string, doesn't that imply there's a module named string? Or in this case does it have to be a command line module (e.g., you can only get at it via chpl string.chpl, but beyond that point, it's inaccessible...)?
In this case, there is no module named string
or anything based on the filename, only a module named M
. So you'd use use M...
or import M...
to get at the code defined in string.chpl
.
(Is string.chpl
a good filename for this code? No. But it also seems innocuous to me, which is why I was curious whether you'd disallow this case—which I think is what a literal reading of your previous comment would suggest).
That's super strange and kind of feels like dishonest code to me 😄, in that if I think that would be frowned upon if we (or other Chapel programmers, I'd hope) happened to see that in the wild.
How does the compiler know to go search string.chpl
for M
if it doesn't happen to be listed on the original command line invocation?
I get the whole attitude about not wanting to impose too much structure on filenames, but I think we already impose some structure (e.g., we require a .chpl
extension). Additionally I think we expect any useable module to be a valid Chapel identifier, and once we're at that stage I think it's not too strange to expect that the identifier conform to the same rules that other symbol names conform ot
I don't know whether you've noticed, but so far you've completely dodged answering my question. :) (and I didn't mean it rhetorically).
That's super strange and kind of feels like dishonest code to me 😄, in that if I think that would be frowned upon if we (or other Chapel programmers, I'd hope) happened to see that in the wild.
Well, sure but there are more benign forms as well that seem way less sketchy. Say I've got M.chpl
that contains module M { ... }
and then I find a bug so I make a clone of it called M-bug.chpl
or I want to have multiple versions of it, so I create one called M-0.1.chpl
, M-0.2.chpl
, M-working.chpl
etc. These are obviously less strange than a file named string
containing a module named M
, but illustrate reasons why a filename may not describe the module it defines precisely, and why it would be annoying if it had to do so, or had to be a legal identifier.
How does the compiler know to go search string.chpl for M if it doesn't happen to be listed on the original command line invocation?
It wouldn't, and you're right that this would need to be specified on the chpl
command-line in order to be usable/useful (though at times it has been proposed at times that the compiler should be smarter than it currently is in terms of searching for modules—so that, for example, in my M-*.chpl
examples above, if only one of the files were in the search path, it could resolve it as defining M
despite not being named M.chpl
; but that's sometimes controversial, so let's ignore it for this conversation).
Additionally I think we expect any useable module to be a valid Chapel identifier...
Well yes, but we don't preclude people from using filenames that are not valid Chapel identifiers (so, emphasis on 'useable' in your phrase above), where a common use case is for a main module or single-file test. I.e., if I put writeln("Hello, David!");
into a file named hello-david.chpl
, I can't refer to the module by name (since the hyphen is a subtraction in Chapel), but I'm also not precluded from defining it (which I think is definitely a good thing... imagine all the hyphen-using test filenames we'd need to rename otherwise). I view this issue as arguably requesting the same thing ("So I happened to name my main module the same thing as a reserved word without realizing it... why are you holding that against me?").
Another motivation might be that I want to create a binary / command-line utility whose name matches a reserved word. If we permit it, a file named owned.chpl
can create a binary named owned
without my having to use a different name and compile with the -o owned
flag to get it to be what I want.
...I think it's not too strange to expect that the identifier conform to the same rules that other symbol names conform
It seems arbitrary to me to say "some filenames that are not legal identifers are OK and others are not." And if we were to put in a rule to blacklist certain filenames, I'd just as soon make the rule munge those symbols into a form that the user couldn't refer to... and potentially issue a warning letting them know (though that may just be annoying if I'm not planning to make a module out of it...).
I don't know whether you've noticed, but so far you've completely dodged answering my question. :) (and I didn't mean it rhetorically).
Oh, I notice. I just didn't have enough information to answer the question yet 😄.
Given the discussion, I think I would only issue an error over string.chpl
if it happens to introduce an implicit module named string
. So really, the error is for the module name, and we just point to renaming the file (or making an explicit module) as ways to fix it.
So yeah, you could do a string.chpl
if the contents are module M { ... }
. Is that strange? I think so, but...
Edit: And as to what you're probably alluding to, maybe I can open a separate issue along the lines of: "Should reserved words be allowed as module names?"
I think I would only issue an error over string.chpl if it happens to introduce an implicit module named string
What's your rationale for permitting some filenames that are not legal identifiers introduce un-typeable module names (again hello-david.chpl
being an example) but not others?
It doesn't really seem to matter what the source file name is if it's not creating an implicit module at all. In which case string.chpl
and hello-david.chpl
are both fine, I think.
So what's interesting is what happens when implicit modules are created...
Coming back to the hello-david.chpl
case, it seems like that module name would also be unuseable due to munging unless we document what that munging is or allow some form like use "hello-david.chpl"
. I'm not sure what to do there, but it doesn't seem like a consistency problem (compared to string.chpl
) because that particular case is broken due to the presence of -
.
In a world where we have a way to do use "hello-david"
, then I think such an implicit module name is fine. The name isn't really the issue here, it's just that we can't get at the thing because we can't express it.
In the string.chpl
case, creating an implicit module out of string
seems wrong for the same reason that module string { ... }
seems wrong. I don't believe we allow the latter today. The compiler emits an error in parsing along the lines of "error: attempt to redefine reserved word 'string'" (I interacted with this one recently 😄).
Even if we could make use string
parse, would we want to? We emit errors for proc string() { ... }
as well, today. If we allow one but not the other, then we start having different rules for different categories of symbol and it all gets rather difficult to keep track of.
The presence of an implicit module string
implies the existence of string.chpl
. The only way to avoid that is to politely ask that users name the source file something else. Maybe we could do munging, but it seems convoluted to do munging on a file name just because it conflicts with a reserved word. Honestly like a bit more trouble than it's worth.
I'm not going to try and advocate for ProperChapelSymbolNamesOnly.chpl
only even if that's what I personally prefer, because I'm keenly aware that a large percentage of our 14k+ tests use foo-bar.chpl
, and users probably use that format for source files as well.
However I think we can allow hello-david.chpl
while disallowing string.chpl
or borrowed.chpl
and still have everything be internally consistent.
And as devil's advocate against myself, maybe we could do use "string" as StringModule
or use string as StringModule
etc., as a workaround to name conflict (users can only ever refer to it outside of the initial use by using a Chapel identifier, via renaming). I still feel weird about that, though. If we allow that, then why don't we allow proc string() { ...}
and from foo use string as stringProc
? I'd prefer if our different categories of symbols followed the same set of rules.
Coming back to the hello-david.chpl case, it seems like that module name would also be unuseable due to munging
That's correct, it is.
due to munging unless we document what that munging is or allow some form like use "hello-david.chpl"
Let's assume we don't do those things for this discussion (mostly because they're irrelevant; partially because I don't want to do them).
creating an implicit module out of string seems wrong for the same reason that module string { ... } seems wrong. The presence of an implicit module string...
But wait. The leading proposal is to create a(n undefined) munged name for string.chpl
just like with hello-david.chpl
. For example, we could call it @@@string@@@
or @@@chpl_reserved_word_module_1@@@
and not divulge what we were going to call it. If a user tried to type use string...
or import string...
they'd get a syntax error (or better) and see the error in their ways. If they were just using that name for a single-file test or main module, then it's moot.
Even if we could make
use string
parse, would we want to?
No, and I'm not proposing that we would want to do that or should try.
The only way to avoid that is to politely ask that users name the source file something else. Maybe we could do munging, but it seems convoluted to do munging on a file name just because it conflicts with a reserved word. Honestly like a bit more trouble than it's worth.
This is where I was arguing that if we have a blacklist of "things you may not name your file" in order to generate that polite warning, then we might as well just have things on that blacklist generate a munged name instead. It's not like it's harder or more work, and if it saves users who accidentally reach for a reserved word frustration ("Sorry, you can't use index.chpl
"), that seems like a big win. IMO, what is or isn't a legal filename (or base filename if you're going to bring up .chpl
again :) ) shouldn't be dictated by the language, just the OS.
Hmmmm, yeah, one glaring flaw in my POV is that a module name like module hello-david { ... }
is definitely not legal as well (different reason, but still not legal), which kind of suggests that string
and hello-david
should be treated the same. I kind of fixated on the string
case too much.
I think I'm OK with the "your implicit module can be named hello-david
or string
, but it isn't useable directly, at least for now" approach. Implementation-wise, I'm not even sure if we have to go so far as to munge it at all. At least, I think dyno ought to be able to handle a module named string
just fine. Sounds like the production compiler has issues, though.
I was going to go down the "but why can't we have procedures named string
" track, but I think that's rather moot because modules are already special owing to hello-david
🥲.
Implementation-wise, I'm not even sure if we have to go so far as to munge it at all.
If it never bit people, I'd agree that this would be fine. In practice it seems to, though, which suggests doing something more if dyno doesn't make the problem go away.
I've run into this issue recently when working on #25125. At present, I think some of our dyno code is assuming that you would get a redefinition error if you try to write something silly like this:
module int { } // int.chpl:1: error: attempt to redefine reserved type 'int'
And, so we can assume that something named int
always refers to the builtin int
type (i.e, the regular keyword).
The trouble happens with the implicit module created. For example, we can have this:
// int.chpl -- creates implicit module int
and now we don't get such an error; in fact, we get different errors depending on the specific keyword / type being used.
I'm going to look at the munging approach because
subdomain.chpl
being one example in our testing); so we should make some effort not to break compilation in such casesupdate the logic that munges filenames into legal module names (e.g., removing punctuation) to sidestep conflicts with (some/all) reserved words by calling the module something else
To be clear, we aren't munging such module names today before codegen. Instead, we have module names that can't be referred to because you can't write the identifier naming them. When we get to generating C files or choosing the function name for a module initializers or anything like that, we do the munging. But within our own compiler, we're not munging; the implicit module name for a file M-bug.chpl
really is M-bug
.
Summary of Problem
At least some reserved words cannot be used as a file name and cause a confusing error message. I got the following error trying to compile
borrowed.chpl
However, Brad was able to compile
if.chpl
so not all reserved words cause the error.Steps to Reproduce
Source Code:
Compile command:
chpl borrowed.chpl
Execution command:
NA
Associated Future Test(s):
NA
Configuration Information
chpl --version
:$CHPL_HOME/util/printchplenv --anonymize
:gcc --version
orclang --version
:module list
: