GideonZ / 1541ultimate

Official GIT archive of 1541 ultimate II sources
GNU General Public License v3.0
173 stars 45 forks source link

Bug in the hyperspeed kernal #394

Open markusC64 opened 5 months ago

markusC64 commented 5 months ago

In the hyperspeed kernal, the following basic program does give the wrong result (adapt the device number to your iec device):

10 open 1,10,15 20 print#1,"md:foo"; 30 print#1,"i" 40 close 1

With UCI enabled, a direcory "fooi" is been created. With UCI disabled, it works as it should: A directory "foo" is created, and then initialize is executed.

Tested on Ultimate 64 with the release firmware 3.11.

markusC64 commented 5 months ago

With this replacement you see that the semicolon on line 20 is not necessary:

20 print#1,"md:foo" 30 print#1,"cd:foo"

After that, you are not inside of direcectory "foo".

bvl1999 commented 4 months ago

Curiously, this seems to work fine with C128 device manager's hyperspeed implementation (which is based on, but different from the hyperspeed kernal one), so I suspect this bug is in the hyperspeed kernal itself. My first guess would be the chkin/ckout caching.

This is with C128DM enabled, and using id 11 for 'hyperspeed'. soft iec is disabled.

Screenshot 2024-02-14 16-05-51

markusC64 commented 4 months ago

I am not an expert in C128 kernal (and not C64 kernal). But I have the unverified impression that the problem is that CLRCH call ($FFCC) does not finish a command. Anyway, first I have observed the bug in an closed source assembler program. I made the mistake while trying to reproduce I queried the error channel quite often. In this case the kernal works.

bvl1999 commented 4 months ago

I am not an expert in C128 kernal (and not C64 kernal). But I have the unverified impression that the problem is that CLRCH call ($FFCC) does not finish a command. Anyway, first I have observed the bug in an closed source assembler program. I made the mistake while trying to reproduce I queried the error channel quite often. In this case the kernal works.

That is entirely possible, and a good pointer to start looking. I know the implementation of this between the hyperspeed kernal and dm are slightly different. Both do implement their own clrchn for UCI to avoid having to flush buffers in the soft iec target when a channel gets reused after a clrchn, and so do some caching of the last used input and output channels. C128DM's hyperspeed implementation is outside the kernal, it extends the io code in a way similar to warpspeed128 (and many fastload cartridges for the c64, but unlike such cartridges, also works for sequential io, relative files, etc)

But it has been a few years since I last looked at that code.

bvl1999 commented 4 months ago

Regardless, this doesn't seem to be a firmware bug, it surely should be fixed, but imo isn't 'high priority' to fix. Sadly the hyperspeed kernal isn't used as much as we hoped when hyperspeed was implemented.

markusC64 commented 4 months ago

On the other hand, the hyperspeed kernal is one way of ruling out timing problems on the IEC bus as a cause. I have currently a situation where all seem to make a difference: Jiffydos kernal or not, PAL, NTSC. Hyperspeed kernal cannot be tested because of this bug.

Anyway, I have something that might help increasing hyperspeed kernal's usage: A kernal that has both, Hyperspeed support and Jiffydos. For the C64. But of cause it will need the same fix as the "standard" hyperspeed kernal.

bvl1999 commented 4 months ago

On the other hand, the hyperspeed kernal is one way of ruling out timing problems on the IEC bus as a cause. I have currently a situation where all seem to make a difference: Jiffydos kernal or not, PAL, NTSC. Hyperspeed kernal cannot be tested because of this bug.

Anyway, I have something that might help increasing hyperspeed kernal's usage: A kernal that has both, Hyperspeed support and Jiffydos. For the C64. But of cause it will need the same fix as the "standard" hyperspeed kernal.

Use a C128 and C128DM :-) (that setup allows for both hyperspeed and jiffydos at the same time, and actually should work with most other custom c128 kernals as well).

There is an unfinished project, the fc3wedge .crt, which is a .crt implementation of hyperspeed, with the idea it can coexist with the jd kernal, but, there are some bugs which neither me or Gideon spent the time on to fix them).

Anyway, the 'quick' fix would probably be to disable the caching of input/output channels, that would cause clrchn to always flush the buffers in uci, which is not good for performance of sequential io, but should avoid this problem.

Anyway... yeah, using hyperspeed kernal to completely bypass the iec bus would seem usefull for testing, just as long as you keep in mind there are some subtle differences beyond the 'bus' (ie, from what I recall, hyperspeed kernal abuses the highest byte of the various file tables, which could cause problems when opening 10 files at the same time, c128dm avoids this by having a little bit of cartridge ram outside system ram for keeping track of this data)

bvl1999 commented 4 months ago

So, the offending code seems to be this bit from roms/c64rom/kernal/uci.s just below the _my_chkout label:

        ; there is a pending CKOUT command, same SA?
        lda SECADDR
        cmp UCI_LAST_SA
        beq _ckout_cont ; Yes!  Do nothing, just append

For a quick fix you can just comment out that cmp and beq. That will flush the output buffer for each print# in basic. I think a better fix would be to check for SECADDR being $0f, and in that case branch ahead to after the UCI_LAST_SA check, and always flush the output buffer when the secondary address is 15. I don't think this is needed for sequential files, but I haven't considered how this buffer appending works out for relational files.

There is a similar check in _my_chkin, but for reading this should be fine in all cases.

bvl1999 commented 4 months ago

@markusC64 does this work for you? https://www.bartsplace.net/download/hskernal394.bin

markusC64 commented 4 months ago

Unfortunately not. I am still receiving "cp1\ncd//os". So obviously two commands.

markusC64 commented 4 months ago

Interestingly enough, just removing cmp and beq does better...

bvl1999 commented 4 months ago

Unfortunately not. I am still receiving "cp1\ncd//os". So obviously two commands.

Odd, will look at that later today when I'm back home.

Interestingly enough, just removing cmp and beq does better...

Yeah it should, and that will do for the moment for testing, but it is not good for performance when writing sequential files, so still want to do a more proper fix which won't have that issue.

markusC64 commented 4 months ago

Is it possible that you need to compare with #$6F instead of #$0F? A peek command from basic 2.0 of to the address makes me think that.

bvl1999 commented 4 months ago

Is it possible that you need to compare with #$6F instead of #$0F? A peek command from basic 2.0 of to the address makes me think that.

Maybe, will check that when I'm back home.

GideonZ commented 4 months ago

I think the problem is not in the hyperspeed kernal, but in the way the commands are passed to the command channel, or how the command channel is processing the commands. The 'just append' branch is there to avoid a UCI command for every single byte. But unfortunately that is how IEC works. I think queuing the bytes is fine as long as the commands are executed one by one if they are separated by a return character (/r). This doesn't happen, as the dispatch function (the routine that takes the data from UCI and feeds it to SoftwareIEC) assumes that it can just push everything at once.

The reason that it works with IEC is that it responds to "unlisten". This probably doesn't happen, although I thought that I did pass the "CLRCH" call. We should check whether the CLRCH happens in between two PRINT commands. I think it does, or it should.

OR... the hyperspeed kernal doesn't clear the last channel byte upon CLRCH?

markusC64 commented 4 months ago

Well, it is time to test the hyperspeed kernal. I think I'll use Bart's version, but with cmp patched from cmp #$$0F to cmp #$6F. And let C64OS Restore do it's job. Its reading and writing a megabyte of data, interleaved with "cd", "md" commands. So quite much data on the uci. We will see if there are bytes missing. That is the example that fails with current hyperspeed kernal. I have a reference of each file it saves. And yes, C64 kernal can be best tested with C64 software. You can model the kernals behavior in a simulation, but then you cannot be sure enough that the model equals the kernal.

BTW: "cP\x0D" (read it with C rules) is a valid command with a cr "inside" (at the end). So even this character cannot be used for separating commands. Another example is "m-r" with a suitable address or length.

I managed to port this change to the combined Jiffydos + Hyperspeed kernal. Which cannot be published in source code or binary because Jiffydos is still sold. But a patch that can be applied to the original jiffydos kernal can be published. If we need to optimize barts change, that is no problem. I think I have some free bytes still available.

GideonZ commented 4 months ago

BTW: "cP\x0D" (read it with C rules) is a valid command with a cr "inside" (at the end). So even this character cannot be used for separating commands. Another example is "m-r" with a suitable address or length.

Fair point. Well, in that case something goes wrong with CLRCH.

I don't completely agree with your testing strategy though. Testing with a lot of data doesn't test better. You have a failing case right now, and that should be solved. The less data the better, I'd say. (Work smarter, not harder.. ;)

I managed to port this change to the combined Jiffydos + Hyperspeed kernal.

That's actually great!

GideonZ commented 4 months ago

Are we sure that this is not a bug that has been introduced with 3.11? Some control characters have been changed in the communication between the IEC communication layer and the drive code, since they have been separated.

markusC64 commented 4 months ago

Of cause you're right. More data dioes not make it better. But since that program is closed source, it is easier to wait for the result than to disassemble, understand and make an assembler test case, You see, we need both test cases, the mentioned basic test case and the assembler test case. And in that test case, it is really the case that CLRCH is called between the commands. But bow that we have a small basic example to test, we can test older firmware to to see if they have the bug.

Well, since a CMP to #$6F works better, that is clearly needed. But we wait for Barts results. It still might be possible that some corner cases need another value, too.

markusC64 commented 4 months ago

Oh fine. The whole task succeeded without a single error. Every file has exactly the expected content. Every command was transferred correctly (otherwise files would have been created in the wrong directory). So this test just revealed one problem, probable due to a zero page conflict: The software suggested the wrong destionation partition, I have chosen the correct one. IIRC correct, it uses the zeropage to store it in between. That is the real target for this test: Finding problems one was not aware of. If comparing the data helps for the actual test, that is great and welcome.

bvl1999 commented 4 months ago

I'm not sure a command is always followed by a \n.

Consider the basic program in the initial report, that shouldn't have a \n between the commands I'd think.

GideonZ commented 4 months ago

I already stood corrected by Markus. :-)

Just the compare with 6F? Hmm... doesn't feel right. I think the issue should be solved in a more fundamentally correct manner.

markusC64 commented 4 months ago

In fact both my basic program (line 20) and Gregs assembler program does not terminate a floppy command with a \r (ASCII 13) and also not with with \n (ASCII 10).

BTW: I had to move "MY_OUTLEN" from $9b to $c0 for the kernal with Jiffydos + Hyperspeed because the address $9b conflicts with jiffydos. Perhaps we should do the same for the kernal without jiffydos in order to have same addresses in use.

bvl1999 commented 4 months ago

I already stood corrected by Markus. :-)

Just the compare with 6F? Hmm... doesn't feel right. I think the issue should be solved in a more fundamentally correct manner.

I think the 'correct' way would be to also 'emulate' listen/unlisten. From what I can tell, the unlisten that happens during a clrchn is what terminates the command. But that would not be good performance wise.

Checking for $6f feels wrong, and would behave differently from iec in a specific case, consider the following:

10 open 1,11,15 20 print#1, "cd:/usb0/test"

No close, no extra print# In that case, the 'fix' I suggest would cause the command to not get executed until something else tries to use the uci soft iec target, whereas over iec, it should get executed after the print#

markusC64 commented 4 months ago

Oh, you're right...

I think the 'correct' way would be to also 'emulate' listen/unlisten I think.

Well, in that case, you also need iecout and iecin to be patched. Consider a low level assembler program that uses listen, some iecouts and unlisten. It would be bad of listen is redirected to UCI, but the iecout's are sent to the serial bus.

bvl1999 commented 4 months ago

Oh, you're right...

I think the 'correct' way would be to also 'emulate' listen/unlisten I think.

Well, in that case, you also need iecout and iecin to be patched. Consider a low level assembler program that uses listen, some iecouts and unlisten. It would be bad of listen is redirected to UCI, but the iecout's are sent to the serial bus.

And none of those are vectored, so while that might be doable for a kernal, it wouldn't work for any non kernal hyperspeed implementation.

markusC64 commented 4 months ago

So it might be better not to patch listen/talk and unlisten/untalk. With Geos MP3 Kernal I/O Edition, you can test that behavior. It really switches to serial bus when the Geos Driver takes over control, because at that moment, the low level calls are used.

Later device detection fails, but that is a completely unrelated problem.

bvl1999 commented 4 months ago

So it might be better not to patch listen/talk and unlisten/untalk. With Geos MP3 Kernal I/O Edition, you can test that behavior. It really switches to serial bus when the Geos Driver takes over control, because at that moment, the low level calls are used.

Later device detection fails, but that is a completely unrelated problem.

Yeah, there are a few more such implementations around.

Now, in my opinion, using kernal io, but also using the low level functions which aren't vectored rather defeats the point of using kernal io, but alas, we are dealing with 40 years of legacy and people who didn't care about such formalities :-)

markusC64 commented 4 months ago

Indeed. IMHO it is more important to care about bottlenecks you find by benchmarking. E. g. reading "sequential" (*) files is really, really slow. Not only with hyperspeed, but also with jiffydos.

(*) which can have type SEQ, USR or even PRG. The point is that you don't use the load call.

bvl1999 commented 4 months ago

Indeed. IMHO it is more important to care about bottlenecks you find by benchmarking. E. g. reading "sequential" (*) files is really, really slow. Not only with hyperspeed, but also with jiffydos.

(*) which can have type SEQ, USR or even PRG. The point is that you don't use the load call.

Yes.. and that seems unavoidable as it is byte by byte. Even with 'fast serial' on a c128 that is still rather slow (about as fast as jiffydos).

Working around that would be possible by buffering sequential reads, for example, have a 256 byte buffer, and use dma to transfer 256 bytes at a time, but.. no system ram we can steal for that.

markusC64 commented 4 months ago

What do you mean by unavoidable? In fact the IEC device (Jiffydos) is slower than a 1541 with Jiffydos. And much slower than a CMD HD with Jiffydos connected by IEC bus. I see that for hyperspeed and for gettings speeds better than jiffydos it is not that easy.

bvl1999 commented 4 months ago

What do you mean by unavoidable? In fact the IEC device (Jiffydos) is slower than a 1541 with Jiffydos. And much slower than a CMD HD with Jiffydos connected by IEC bus. I see that for hyperspeed and for gettings speeds better than jiffydos it is not that easy.

What I mean is that sequential reads will by a byte by byte thing, unless you can have a read buffer on the c64 side.

load is a lot faster for jiffydos because it uses a block mode transfer which has less overhead. You see the same thing with a c128 and the difference between fast serial and burst mode. loads are way faster than sequential reads.

soft iec, using jiffydos over the iec bus being slower than a 1541 doing the same thing is surprising, I never looked at that.

markusC64 commented 4 months ago

I agree. But for the performance, we should open another issue. Perhaps the same for a possible problem with Jifyfdos protocol timing.

bvl1999 commented 4 months ago

I agree. But for the performance, we should open another issue. Perhaps the same for a possible problem with Jifyfdos protocol timing.

jiffydos compatibility and performance should be its own 'issue' indeed, it is separate from this.

However, performance of the soft iec uci target would seem relevant to this discussion, as this entire SA caching trick exists for performance reasons.

Still have to look further into this, but after a chat with Gideon, there are 2 remaining issues with the check for $6f.

The 'proper' solution we see now is add a clrchn command to uci, and use that as an 'end of record' indication. That would work for both the command channel, and for relational files. It will be a bit less good for write performance probably, but not as bad as flushing the write buffer for every byte, but that seems like a small penalty for having it work correctly.

markusC64 commented 4 months ago

I understand your answer as follows:

Thus, we need somehow both.

bvl1999 commented 4 months ago

I understand your answer as follows:

* Your simple patch is for the current Ultimate firmware.

* The better hyperspeed kernal implementation will require a new firmware on the ultimate.

Thus, we need somehow both.

The '$6f' patch addresses the original issue you reported, so as a workaround should be fine as long as you don't use relational files, and keep in mind the issue with that 2 line basic program I mention a few comments up.

Proper solution requires a firmware change indeed (implement a clrchn command for uci), and also requires a small change to the hyperspeed kernal (make clrchn send a clrchn command to uci).

I think Gideon will take care of the firmware side, I will take care of the hs kernal (and c128dm) side.

markusC64 commented 4 months ago

Proper solution requires a firmware change indeed (implement a clrchn command for uci), and also requires a small change to the hyperspeed kernal (make clrchn send a clrchn command to uci).

If you push those c64 kernal changes to your github, I'll apply them in the combined Jiffydos + Hyperspeed kernal.

bvl1999 commented 4 months ago

Proper solution requires a firmware change indeed (implement a clrchn command for uci), and also requires a small change to the hyperspeed kernal (make clrchn send a clrchn command to uci).

If you push those c64 kernal changes to your github, I'll apply them in the combined Jiffydos + Hyperspeed kernal.

I'll do a pull request so it ends up in Gideon's repo, once firmware + kernal fix are both there.

bvl1999 commented 4 months ago

Just confirmed the problem indeed also exists with relative files, and obviously the '$6f patch' won't fix that.

Still trying to figure out if passing clrchn to uci is enough or if ckout also needs patching.

Oh, and had a bit of a 'duh' moment, the obviosu 'end of record' indication on iec is EOI.

bvl1999 commented 4 months ago

So a thorough read of the kernal rom shows a 'record' or command always ends on EOI, and EOI can be triggered from clrchn, ckout and close.

Also the only situations where it can be ignored are clrchn and ckout for sequential writes.

That actually makes things easier, doing it 'right' simply means having to call uci_abort from ulticlrchn_lsn, and completely removing the 'pending ckout' check from _my_ckout (uci_setup_cmd will take care of executing any pending command).

Now, that will have negative impact on sequential file writes, how much depends on how often a program calls ckout inbetween writing bytes.

Addressing that performance impact requires knowledge about the type of file currently being written.

For sequential files, adding to the buffer is not a problem, in all other cases, uci should handle things 'right now', as it either got a complete command, or a complete record for a relative file. The kernal does not have this knowledge, the soft iec target has. So, regaining that 'lost performance', if important, should happen in the uci soft iec target, and not in the kernal.

Now.. is this also needed for reading? My current impression is no, so for now I'm trying this with the 'reading side' of things untouched, only writing modified.

Going to functionally test this, and get a bit of an idea about write performance impact, and after that will commit the changes to my fork of the 1541ultimate repo.

bvl1999 commented 4 months ago

New test kernal: https://www.bartsplace.net/download/hskernal394.bin

This should address the original 'print#' issue, as well as the second print# issue discussed, and work correctly for writing relative files.

markusC64 commented 4 months ago

@bvl1999 @GideonZ Can we use these zero page addresses instead of the ones from the current hyperspeed kernal?

    MY_OUTLEN       = $c0
    UCI_LAST_CMD    = $92

They have been changed to avoid conflicts with Jiffydos. The benefit would be that I can use the same addresses in my Jiffydos + hyperspeed kernal. So both kernals would use the same zero page addresses.

bvl1999 commented 4 months ago

@bvl1999 @GideonZ Can we use these zero page addresses instead of the ones from the current hyperspeed kernal?

    MY_OUTLEN       = $c0
    UCI_LAST_CMD    = $92

They have been changed to avoid conflicts with Jiffydos. The benefit would be that I can use the same addresses in my Jiffydos + hyperspeed kernal. So both kernals would use the same zero page addresses.

$c0 is part of tape motor control, $92 is used as part of timing for tape... as tape is not supported by the hyperspeed kernal anyway, I don't see a problem with this suggestion.

bvl1999 commented 4 months ago

They have been changed to avoid conflicts with Jiffydos. The benefit would be that I can use the same addresses in my Jiffydos + hyperspeed kernal. So both kernals would use the same zero page addresses.

Just wondering if the fact those 2 are used by the tape handling code of a regular kernal means software which uses kernal io, but doesn't support tape might use those for its own uses, and hence breaking on this change.

While this is C64, its not that different from the C128, where I kept running into software using whatever supposedly unused memory locations I picked for things like those 2 variables, until I was able to completely move them out of system ram. Using addresses also used by jiffydos causes the obvious issue for such a combined hyperspeed/jiffydos kernal, but also means the likelyhood of conflicting with some application code is small as jiffydos has been around since forever, and such software would also have had problems with jiffydos.

So from a kernal/basic point this change should be completely fine, but I do think it would need a bit of compatibility testing with popular C64 software which would benefit from the hyperspeed kernal. Thinking of BBS software, 'native' development tools, and various productivity tools.

bvl1999 commented 4 months ago

@markusC64 https://www.bartsplace.net/download/hskernal394-2.bin has your suggested change so it can be tested.

markusC64 commented 3 months ago

Some infos regardinf $c0 in the zero page - short summary: IDE64 is using $c0 and therefore any compatible C64 software cannot use it.

Greg Nacu — today at 08:48: @markusC64 Ah... Guess what just happened? In 1.06 beta I made a change to file.o kernal module, and to save space and be more efficient I moved the file flags byte to zero page. When I was looking around for a free byte, I picked $c0 (this was before our conversation.) It turns out it totally broke compatibility with IDE64! It's just like we were talking about with HyperKernal. So, I moved it to $cc (which won't be used by IDE64, since it's used by the KERNAL ROM's screen editor (cursor flashing)) and that fixed it. [...]

markusC64 today at 17:52 Do I understand that right: $c0 cannot be used by C64 software because that would break IDE64? That would mean that we can use it for hyperkernal... without breaking other C64 software (you cannot break a software that is already broken 🙂 )

bvl1999 commented 3 months ago

markusC64 today at 17:52 Do I understand that right: $c0 cannot be used by C64 software because that would break IDE64? That would mean that we can use it for hyperkernal... without breaking other C64 software (you cannot break a software that is already broken 🙂 )

It would of course also break IDE64 :-)

Now.. is that a sane combination? and how many people really would have noticed software using $c0 because of having IDE64?

Interesting info nonetheless, thanks.

markusC64 commented 3 months ago

You cannot use hyperspeed kernal and ide64 in parallel, anyway. But IDE64 is quite old. So if a software is incompatible to IDE64, one might call the software broken. On the other hand, using same locations as ide64 makes developing c64 software easier because only one zero page location has to be avoided instead of two.

Anyway, you canniot be absolutely sure. For any zeropage location. Including the ones currently used.

bvl1999 commented 3 months ago

Anyway, you canniot be absolutely sure. For any zeropage location. Including the ones currently used.

Absolutely, and I'm rather happy I managed to avoid using any 'supposedly unused' addresses anywhere in system ram for my c128 hyperspeed code in DM. There are no perfect choices in this.

It would have been nice if we had 4 available addresses in the UCI register address range which we could turn into ram for this, that would really solve this in a very nice way, but alas, we don't have that, and I really don't feel like stealing more addresses in the de00-dfff range

Anyway...

After having put some more thought into this, this is what I am thinking:

For the hyperspeed kernal, using addresses which conflict with JD is not a problem, and imo actually the best choice. JD has been around for a very long time, and is fairly popular, so it is unlikely software which would benefit from hyperspeed hasn't also seen a lot of use with JD, and, for as far as the hyperspeed kernal is concerned, it cannot coexist with JD, so.. using the same zp addresses is not a problem. Such software might also avoid using $c0 so it works with ide64, but ide64 is way less common than JD, so I'd rather think there would be more software around which has been tested with JD than with ide64.

Now, you of course have this combined JD and Hyperspeed kernal, and hence will have to use some extra zp addresses, so for you moving the 2 addresses makes sense, and you should totally do that for that use, but I don't think the 'standalone' hyperspeed kernal should be adapted for that.

Also, I think that if we continue on this discussion, it should be its own feature request, so we can close this issue.

markusC64 commented 3 months ago

It would have been nice if we had 4 available addresses in the UCI register address range

Nothing to add, that would be the perfect solution,

it should be its own feature request, so we can close this issue.

Ok.