chapel-lang / chapel

a Productive Parallel Programming Language
https://chapel-lang.org
Other
1.78k stars 418 forks source link

some reserved words can't be used as a file name #19197

Open jhh67 opened 2 years ago

jhh67 commented 2 years ago

Summary of Problem

At least some reserved words cannot be used as a file name and cause a confusing error message. I got the following error trying to compile borrowed.chpl

error: 'borrowed' has multiple definitions, redefined at:
  borrowed.chpl:1

However, Brad was able to compile if.chpl so not all reserved words cause the error.

Steps to Reproduce

Source Code:

writeln("Hello World!");

Compile command:

chpl borrowed.chpl

Execution command:

NA

Associated Future Test(s):

NA

Configuration Information

bradcray commented 2 years ago

I looked into this a bit, and the conflict that's occurring is between the module named borrowed that's created from the filename borrowed.chpl (that was the easy part) and an internal type named borrowed that we introduce within the compiler to represent the borrowed classes:

dtBorrowed = createInternalType("borrowed", "borrowed");

While I determined as John mentions in the OP that if didn't have a similar error, that's because we don't create internal symbols named "if". We do make a number of other internal symbols—particularly around type—that will have the same problem as in the OP. E.g., a file named int.chpl has the same issue for the same reason.

So I think possible paths forward are:

dlongnecke-cray commented 2 years ago

I'd honestly prefer if it were not possible to use any reserved words as file names. I think this is the simplest and most straight-forward approach to take. We already emit errors when trying to redefine reserved word/type names in e.g. a procedure, and so doing the same for modules (whether implicit or explicit) seems consistent to me.

bradcray commented 2 years ago

@dlongnecke-cray: Would you disallow a file named string.chpl whose contents were:

module M {
  ...
}

(i.e., that did not introduce an implicit module named string?).

dlongnecke-cray commented 2 years ago

Wait that's possible? I can never keep track of the rules...How do you get at the contents of string.chpl if you can't say use string? And if you say use string, doesn't that imply there's a module named string? Or in this case does it have to be a command line module (e.g., you can only get at it via chpl string.chpl, but beyond that point, it's inaccessible...)?

bradcray commented 2 years ago

Wait that's possible? I can never keep track of the rules... A filename is only used as the module name if the file contains top-level code that is not within a module declaration itself. And arguably, a best practice for writing super-safe code (i.e., not a quick sketch) would be to not rely on the filename-based module name so that the behavior of the code is independent of the file you happen to be storing it in. My general philosophy is that language behavior and semantics shouldn't be too tied to a filename so that you can do things like pipe a file into a compiler or tool, paste it into a web compiler, whatever without a behavior change. I still think the "implicit module name based on filename" feature is important, though, from a convenience perspective (e.g., writing a one-line hello world) and so that all code is defined in some module.

How do you get at the contents of string.chpl if you can't say use string? And if you say use string, doesn't that imply there's a module named string? Or in this case does it have to be a command line module (e.g., you can only get at it via chpl string.chpl, but beyond that point, it's inaccessible...)?

In this case, there is no module named string or anything based on the filename, only a module named M. So you'd use use M... or import M... to get at the code defined in string.chpl.

(Is string.chpl a good filename for this code? No. But it also seems innocuous to me, which is why I was curious whether you'd disallow this case—which I think is what a literal reading of your previous comment would suggest).

dlongnecke-cray commented 2 years ago

That's super strange and kind of feels like dishonest code to me 😄, in that if I think that would be frowned upon if we (or other Chapel programmers, I'd hope) happened to see that in the wild.

How does the compiler know to go search string.chpl for M if it doesn't happen to be listed on the original command line invocation?

I get the whole attitude about not wanting to impose too much structure on filenames, but I think we already impose some structure (e.g., we require a .chpl extension). Additionally I think we expect any useable module to be a valid Chapel identifier, and once we're at that stage I think it's not too strange to expect that the identifier conform to the same rules that other symbol names conform ot

bradcray commented 2 years ago

I don't know whether you've noticed, but so far you've completely dodged answering my question. :) (and I didn't mean it rhetorically).

That's super strange and kind of feels like dishonest code to me 😄, in that if I think that would be frowned upon if we (or other Chapel programmers, I'd hope) happened to see that in the wild.

Well, sure but there are more benign forms as well that seem way less sketchy. Say I've got M.chpl that contains module M { ... } and then I find a bug so I make a clone of it called M-bug.chpl or I want to have multiple versions of it, so I create one called M-0.1.chpl, M-0.2.chpl, M-working.chpl etc. These are obviously less strange than a file named string containing a module named M, but illustrate reasons why a filename may not describe the module it defines precisely, and why it would be annoying if it had to do so, or had to be a legal identifier.

How does the compiler know to go search string.chpl for M if it doesn't happen to be listed on the original command line invocation?

It wouldn't, and you're right that this would need to be specified on the chpl command-line in order to be usable/useful (though at times it has been proposed at times that the compiler should be smarter than it currently is in terms of searching for modules—so that, for example, in my M-*.chpl examples above, if only one of the files were in the search path, it could resolve it as defining M despite not being named M.chpl; but that's sometimes controversial, so let's ignore it for this conversation).

Additionally I think we expect any useable module to be a valid Chapel identifier...

Well yes, but we don't preclude people from using filenames that are not valid Chapel identifiers (so, emphasis on 'useable' in your phrase above), where a common use case is for a main module or single-file test. I.e., if I put writeln("Hello, David!"); into a file named hello-david.chpl, I can't refer to the module by name (since the hyphen is a subtraction in Chapel), but I'm also not precluded from defining it (which I think is definitely a good thing... imagine all the hyphen-using test filenames we'd need to rename otherwise). I view this issue as arguably requesting the same thing ("So I happened to name my main module the same thing as a reserved word without realizing it... why are you holding that against me?").

Another motivation might be that I want to create a binary / command-line utility whose name matches a reserved word. If we permit it, a file named owned.chpl can create a binary named owned without my having to use a different name and compile with the -o owned flag to get it to be what I want.

...I think it's not too strange to expect that the identifier conform to the same rules that other symbol names conform

It seems arbitrary to me to say "some filenames that are not legal identifers are OK and others are not." And if we were to put in a rule to blacklist certain filenames, I'd just as soon make the rule munge those symbols into a form that the user couldn't refer to... and potentially issue a warning letting them know (though that may just be annoying if I'm not planning to make a module out of it...).

dlongnecke-cray commented 2 years ago

I don't know whether you've noticed, but so far you've completely dodged answering my question. :) (and I didn't mean it rhetorically).

Oh, I notice. I just didn't have enough information to answer the question yet 😄.

Given the discussion, I think I would only issue an error over string.chpl if it happens to introduce an implicit module named string. So really, the error is for the module name, and we just point to renaming the file (or making an explicit module) as ways to fix it.

So yeah, you could do a string.chpl if the contents are module M { ... }. Is that strange? I think so, but...


Edit: And as to what you're probably alluding to, maybe I can open a separate issue along the lines of: "Should reserved words be allowed as module names?"

bradcray commented 2 years ago

I think I would only issue an error over string.chpl if it happens to introduce an implicit module named string

What's your rationale for permitting some filenames that are not legal identifiers introduce un-typeable module names (again hello-david.chpl being an example) but not others?

dlongnecke-cray commented 2 years ago

It doesn't really seem to matter what the source file name is if it's not creating an implicit module at all. In which case string.chpl and hello-david.chpl are both fine, I think.

So what's interesting is what happens when implicit modules are created...

Coming back to the hello-david.chpl case, it seems like that module name would also be unuseable due to munging unless we document what that munging is or allow some form like use "hello-david.chpl". I'm not sure what to do there, but it doesn't seem like a consistency problem (compared to string.chpl) because that particular case is broken due to the presence of -.

In a world where we have a way to do use "hello-david", then I think such an implicit module name is fine. The name isn't really the issue here, it's just that we can't get at the thing because we can't express it.

In the string.chpl case, creating an implicit module out of string seems wrong for the same reason that module string { ... } seems wrong. I don't believe we allow the latter today. The compiler emits an error in parsing along the lines of "error: attempt to redefine reserved word 'string'" (I interacted with this one recently 😄).

Even if we could make use string parse, would we want to? We emit errors for proc string() { ... } as well, today. If we allow one but not the other, then we start having different rules for different categories of symbol and it all gets rather difficult to keep track of.

The presence of an implicit module string implies the existence of string.chpl. The only way to avoid that is to politely ask that users name the source file something else. Maybe we could do munging, but it seems convoluted to do munging on a file name just because it conflicts with a reserved word. Honestly like a bit more trouble than it's worth.

I'm not going to try and advocate for ProperChapelSymbolNamesOnly.chpl only even if that's what I personally prefer, because I'm keenly aware that a large percentage of our 14k+ tests use foo-bar.chpl, and users probably use that format for source files as well.

However I think we can allow hello-david.chpl while disallowing string.chpl or borrowed.chpl and still have everything be internally consistent.

And as devil's advocate against myself, maybe we could do use "string" as StringModule or use string as StringModule etc., as a workaround to name conflict (users can only ever refer to it outside of the initial use by using a Chapel identifier, via renaming). I still feel weird about that, though. If we allow that, then why don't we allow proc string() { ...} and from foo use string as stringProc? I'd prefer if our different categories of symbols followed the same set of rules.

bradcray commented 2 years ago

Coming back to the hello-david.chpl case, it seems like that module name would also be unuseable due to munging

That's correct, it is.

due to munging unless we document what that munging is or allow some form like use "hello-david.chpl"

Let's assume we don't do those things for this discussion (mostly because they're irrelevant; partially because I don't want to do them).

creating an implicit module out of string seems wrong for the same reason that module string { ... } seems wrong. The presence of an implicit module string...

But wait. The leading proposal is to create a(n undefined) munged name for string.chpl just like with hello-david.chpl. For example, we could call it @@@string@@@ or @@@chpl_reserved_word_module_1@@@ and not divulge what we were going to call it. If a user tried to type use string... or import string... they'd get a syntax error (or better) and see the error in their ways. If they were just using that name for a single-file test or main module, then it's moot.

Even if we could make use string parse, would we want to?

No, and I'm not proposing that we would want to do that or should try.

The only way to avoid that is to politely ask that users name the source file something else. Maybe we could do munging, but it seems convoluted to do munging on a file name just because it conflicts with a reserved word. Honestly like a bit more trouble than it's worth.

This is where I was arguing that if we have a blacklist of "things you may not name your file" in order to generate that polite warning, then we might as well just have things on that blacklist generate a munged name instead. It's not like it's harder or more work, and if it saves users who accidentally reach for a reserved word frustration ("Sorry, you can't use index.chpl"), that seems like a big win. IMO, what is or isn't a legal filename (or base filename if you're going to bring up .chpl again :) ) shouldn't be dictated by the language, just the OS.

dlongnecke-cray commented 2 years ago

Hmmmm, yeah, one glaring flaw in my POV is that a module name like module hello-david { ... } is definitely not legal as well (different reason, but still not legal), which kind of suggests that string and hello-david should be treated the same. I kind of fixated on the string case too much.

I think I'm OK with the "your implicit module can be named hello-david or string, but it isn't useable directly, at least for now" approach. Implementation-wise, I'm not even sure if we have to go so far as to munge it at all. At least, I think dyno ought to be able to handle a module named string just fine. Sounds like the production compiler has issues, though.

I was going to go down the "but why can't we have procedures named string" track, but I think that's rather moot because modules are already special owing to hello-david 🥲.

bradcray commented 2 years ago

Implementation-wise, I'm not even sure if we have to go so far as to munge it at all.

If it never bit people, I'd agree that this would be fine. In practice it seems to, though, which suggests doing something more if dyno doesn't make the problem go away.

mppf commented 3 months ago

I've run into this issue recently when working on #25125. At present, I think some of our dyno code is assuming that you would get a redefinition error if you try to write something silly like this:

module int { } // int.chpl:1: error: attempt to redefine reserved type 'int'

And, so we can assume that something named int always refers to the builtin int type (i.e, the regular keyword).

The trouble happens with the implicit module created. For example, we can have this:

// int.chpl -- creates implicit module int

and now we don't get such an error; in fact, we get different errors depending on the specific keyword / type being used.

I'm going to look at the munging approach because

  1. This is causing problems for me now
  2. We are stable and people might have files with keyword names (subdomain.chpl being one example in our testing); so we should make some effort not to break compilation in such cases
  3. The discussion seemed to be leading towards the munging approach anyway.
mppf commented 3 months ago

update the logic that munges filenames into legal module names (e.g., removing punctuation) to sidestep conflicts with (some/all) reserved words by calling the module something else

To be clear, we aren't munging such module names today before codegen. Instead, we have module names that can't be referred to because you can't write the identifier naming them. When we get to generating C files or choosing the function name for a module initializers or anything like that, we do the munging. But within our own compiler, we're not munging; the implicit module name for a file M-bug.chpl really is M-bug.