dschmenk / PLASMA

Proto Language AsSeMbler for All (formerly Apple)
MIT License
189 stars 26 forks source link

Handling of long imported/exported identifiers #66

Closed ZornsLemma closed 1 year ago

ZornsLemma commented 2 years ago

README.md says, about identifiers for exported/imported functions:

They are all converted to uppercase with 16 characters significant when the loader resolves them.

Is this actually true? I admit I haven't tested the Apple VM, but looking at the code it doesn't seem to obviously do any kind of truncation when comparing, and it seems as if the compiler is happy to generate modules with identifiers longer than 16 characters (although it does truncate at 32 characters).

I can't help thinking it might be simplest and safest if the compiler simply refused to compile code which imports or exports a name longer than 16 characters. This would avoid potential for a buffer overrun in the VM (lookupextern() uses a fixed 16 byte buffer for 'str', which I think might actually imply a maximum length of 15 characters?) and would avoid any surprises for users who happen to have two identifiers which differ only after the first 16 characters.

I appreciate you may not have the time or inclination to look into this at the moment but I'd appreciate your thoughts. Making the compiler generate these errors would probably be something I could implement myself if you'd be interested in a pull request to do this.

dschmenk commented 2 years ago

Off the top of my head, I think this is actually a limitation of Apple’s ProDOS and SOS operating systems. Filenames are limited to 16 characters although the entire path can extend to 64, IIRC. I use paths for storing system modules. Does that make any sense?

On Jul 4, 2022, at 11:21 AM, ZornsLemma @.***> wrote:

README.md says, about identifiers for exported/imported functions:

They are all converted to uppercase with 16 characters significant when the loader resolves them.

Is this actually true? I admit I haven't tested the Apple VM, but looking at the code it doesn't seem to obviously do any kind of truncation when comparing, and it seems as if the compiler is happy to generate modules with identifiers longer than 16 characters (although it does truncate at 32 characters).

I can't help thinking it might be simplest and safest if the compiler simply refused to compile code which imports or exports a name longer than 16 characters. This would avoid potential for a buffer overrun in the VM (lookupextern() uses a fixed 16 byte buffer for 'str', which I think might actually imply a maximum length of 15 characters?) and would avoid any surprises for users who happen to have two identifiers which differ only after the first 16 characters.

I appreciate you may not have the time or inclination to look into this at the moment but I'd appreciate your thoughts. Making the compiler generate these errors would probably be something I could implement myself if you'd be interested in a pull request to do this.

— Reply to this email directly, view it on GitHub https://github.com/dschmenk/PLASMA/issues/66, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABBOGJSLTYWBGJOA37ALEUTVSMTTRANCNFSM52T5FVXQ. You are receiving this because you are subscribed to this thread.

ZornsLemma commented 2 years ago

Hi Dave,

Thanks for getting back to me on both of my issues, I hope you enjoyed your camping trip!

I am not sure if what you're saying is going over my head or we're talking at cross purposes. What I'm thinking of here are the names of variables exported by modules and imported by other modules, not the names of the modules themselves. (Incidentally, the lowest common denominator filing system on Acorn only allows 7 characters in the module name.)

This came up because a user of the Acorn port over on stardot reported that they had a module session.pla which did:

export def session_get_new_session(station_id)#1                                                                                                              
    return 0                                                                     
end                                                                              

and another module ecomud.pla which did:

import SESSION
   predef session_get_new_session(station_id)#1
end

and at runtime the VM got upset loading the ecomud module because it couldn't resolve the session_get_new_session import. Changing the name to be <=16 characters fixed this.

I think the Acorn port may be a little quirkier than the Apple port in this regard but I thought there was enough uncertainty around what was correct that I wanted to raise this issue to discuss it with you. The documentation seems to suggest that imports will resolve correctly if they match in the first 16 characters, but it doesn't look (I haven't tested it) as though this is actually true even on the Apple VM, and I thought it would be simpler and also avoid the risk of buffer overflows in lookupextern() if the compiler enforced a limit of 16 characters for exported/imported symbol names.

It's not easy for me to test the Apple VM at the moment as I'm on a Linux box and can't run CiderPress to make disk images for an emulator, but if you like I can try to investigate this in a week or so when I get access to a Windows machine.

Cheers!

Steve

ZornsLemma commented 2 years ago

Hi Dave,

Thanks for getting back to me on both of my issues, I hope you enjoyed your trip!

I think I might be missing your point, but what I'm talking about here are the symbols exported by modules and imported by other modules, not the names of the modules themselves. (Incidentally, on the Acorn port the lowest common denominator limit on module names is 7 characters.) This came up because a user of the Acorn port reported that they had a module session.pla which did:

export def session_get_new_session(station_id)#1
    return 0
end

and another module ecomud.pla which did:

import SESSION
    predef session_get_new_session(station_id)#1
end

and at runtime the VM got upset loading the ecomud module because it couldn't resolve the import correctly. Changing the name to have <=16 characters seemed to fix this.

I think the Acorn port may be a little pickier than the Apple VM about this, but looking at the Apple VM code it still seems there is an assumption (e.g. in lookupextern() in cmd.pla) that the names of symbols imported/exported are <=16 characters, even though the compiler seems to be willing to emit imports/exports of longer symbol names. The documentation seems to imply that only the first 16 characters of a name are significant, but it doesn't look to me as though that's the case. (I haven't been able to test the Apple VM as I'm on a Linux box at the moment and can't run CiderPress; if you like I can try to produce some more concrete examples for you later once I get access to a Windows machine.)

I'm obviously happy to fix the Acorn port if it's buggy, but I had enough doubt about how this works in general that I thought I'd better check with you first. If it's not hard, I can't help feeling that making the compiler refuse to generate modules which import/export symbols of >16 characters would be least confusing for users and would also avoid any risk of buffer overflows in the VM at runtime.

I hope this makes sense anyway. Github/Firefox swallowed the first version of this comment, but if it looks like I've written the same thing twice it will be because the first version has magically reappeared after all. :-)

Cheers.

Steve

ZornsLemma commented 2 years ago

And yes, despite me refreshing the page several times in despair when the first comment got swallowed, it has reappeared after all. Both of those comments try to say the same thing, but perhaps in slightly different ways. :-)

dschmenk commented 1 year ago

Steve - not ignoring you, I promise! I think you've uncovered a couple places I failed to create identifiers up to 32 significant characters. Still trying to get my old brain up to speed with what I did years ago. Some idiot forgot to document his code! Can you expand a little on the tools used? Cross compiler or self hosted? Thanks,

Dave...

ZornsLemma commented 1 year ago

Hi Dave,

I appreciate you taking the time to look into this! I'm suffering broadband problems right now so can't easily create a complete example and upload it, but if it helps I believe all the cases I've been looking at were using the cross-compiler (i.e. the one written in C).

Cheers.

Steve

On 19 July 2022 23:10:41 BST, David Schmenk @.***> wrote:

Steve - not ignoring you, I promise! I think you've uncovered a couple places I failed to create identifiers up to 32 significant characters. Still trying to get my old brain up to speed with what I did years ago. Some idiot forgot to document his code! Can you expand a little on the tools used? Cross compiler or self hosted? Thanks,

Dave...

-- Reply to this email directly or view it on GitHub: https://github.com/dschmenk/PLASMA/issues/66#issuecomment-1189601076 You are receiving this because you authored the thread.

Message ID: @.***>

dschmenk commented 1 year ago

I built Alpha 2 with the 32 character identifier length. Does this address the problem?

ZornsLemma commented 1 year ago

Thanks Dave, this seems to work well! The example exported function name session_get_new_session() now works for me and examining the generated module shows it has its full name, as we'd expect with the new 32 character limit. Even better, if I use a function with a name >32 characters long, it still works and the name is truncated to 32 characters internally.

Here (just for reference; I'm not suggesting you merge this) is a diff which modifies test.pla/testlib.pla to exercise this, and is how I tested it myself:

diff --git a/src/inc/testlib.plh b/src/inc/testlib.plh
index 719f940..e60f24b 100644
--- a/src/inc/testlib.plh
+++ b/src/inc/testlib.plh
@@ -1,4 +1,5 @@
 import testlib
+  predef this_is_a_really_long_name_with_way_more_than_32_characters(station_id)#1
   word print
   const dec = 0
   const hex = 2
diff --git a/src/samplesrc/test.pla b/src/samplesrc/test.pla
index 73a29eb..10975ad 100755
--- a/src/samplesrc/test.pla
+++ b/src/samplesrc/test.pla
@@ -209,4 +209,6 @@ puts("5*0="); puti(ptr*0); putln
 puts("5*1="); puti(ptr*1); putln
 puts("5*2="); puti(ptr*2); putln
 puts("5/1="); puti(ptr/1); putln
+this_is_a_really_long_name_with_way_more_than_32_characters(4)
+this_is_a_really_long_name_with_way_more_than_32_characters(8)
 done
diff --git a/src/samplesrc/testlib.pla b/src/samplesrc/testlib.pla
index 55ce84d..47b37a5 100755
--- a/src/samplesrc/testlib.pla
+++ b/src/samplesrc/testlib.pla
@@ -19,6 +19,12 @@ def puthex(h)#0
   putc(valstr[(h >> 4)  & $0F])
   putc(valstr[ h        & $0F])
 end
+export def this_is_a_really_long_name_with_way_more_than_32_characters(station_id)#1
+  puts("sgns(")
+  puth(station_id)
+  puts(")")
+  putln
+end
 puts(@loadstr)
 putln
 done