Closed divergentdeveloper closed 2 years ago
I bought a Retron in December and have played it for at least 20 hours, including Ms. Pac-Man for 10-15 minutes. I haven't had a single freeze or other issue in any game.
That you have tried several different versions makes me wonder if your Retron isn't defective. The first one I received wouldn't boot at all and had to be exchanged. The replacement is the one that has worked perfectly.
Have you tried different micro SD cards? I've found lots of issues with other systems that rely on micro SD, often changing to a different card fixes whatever problem I'm having.
EDIT: I've been using Stella 6.6. I haven't had any problems with paddles, either.
I've been wondering as well... could be that its defective, but since I see many others reporting the same issues, I wasn't sure. And I've played MS pac man for 10-15 minutes with no issues either... I can play it for a while without noticing it since games usually don't last longer than a few minutes. But for 20 hours, I would expect you'd have seen at least a few freezes.
And the other thing that put some doubt in my mind, is that the original FW runs without any issues, except that it's a.... very limiting UI :)
I tried with the "DONT_OVERCLOCK" option off and it still crashes, and what I saw didn't look like any frame drops i'd ever seen.. it was a mess on screen. It mentioned it was something to try for "buggy" hardware, which is possible in my case. It's just that the original FW runs so well with no issues.
I just bought an harmony cart because it would circumvent the bad UI of the original FW, and will work in case I end up getting a real 2600 if the r77 isn't working out.
I got this Retron from ebay so I can't return it like amazon if i think it's defective.. especially since it's not really defective, it's only the custom FW that has issues with it, and it's hard to prove it's the HW that's the problem.
Oh and I am swapping between 3 SD cards of different brands. Same issues.
Judging from the AtariAge threads on the subject and all of the issues people have had, I don't think your experience is unusual. My experience of it working perfectly may be the outlier.
My R77 works better than my MiSTer running the 7800/2600 core and had become my default choice for 2600. The MiSTer has more compatibility issues than my R77, the frequent video timing issues trip up the MiSTer's HDMI output, and last I checked the MiSTer 7800/2600 core isn't really friendly with original paddles, even with USB adapters that work on everything else perfectly.
Yeah that's the experience I heard about that got me to get it. :) it's only now that I have it that I see all of these folks having issues... They are very consistent though, so that's why I was thinking maybe an issue with the latests versions or that there are different versions of the HW...
I'll try games with the stock FW and see if everything is ok. Stella 6.6 is much better, but that would be an OK compromise for me and kind of salvage my investment. :)
We think it might be a problem with the CPU. We overclocked it a bit to have no lags with modern ARM based games. The overclocking is still within the specs, but maybe there are CPUs with worse quality.
Therefore the next release (6.7) will have an option to disable the overclocking. Maybe that solves your problem.
The option is already there 😏 --- that's DONT_OVERCLOCK
. I have never been able to reproduce those crashes, but I'll give it a try. However, as it is a spurious issue that affects only some people I suspect that it is hardware related. The new firmware uses considerably more resource and more memory.
One thing worth trying might be a different power supply.
@divergentdeveloper Did you already try DONT_OVERCLOCK
?
Yes , I tried DONT_OVERCLOCK. I think I saw dropped frames (ms pac man sprites didn't get cleared and ended up drawing on top of each other) and it still ended up crashing.
It'a good idea to included it as a setting in the UI :) It will be way easier/faster for me when testing...
I suspect that even 1ghz "running hot" is too much for the flimsy R77s. I just saw that there's only one vent on the bottom of the thing, so that probably doesn't help either.
I've seen the stock FW run nicely without any issues for a long time, so what I want to try next is to disable all the extra filters, time machine, etc that the new Stella offers and try to match the config of their original FW, and see if it doesn't crash.
Did the DONT_OVERCLOCK extend the time until a crash? Regarding the ventilation, have you tried to run it upside down?
I wonder if the chip shortage caused Hyperkin to use 2nd grade chips. IIRC we had no reports earlier on. From when is your console, is it an orange one maybe?
It didn't seem to extend the time before the crash, but that was already varying between 1 minute and 14 minutes so it's hard to tell.. It crashed after 2 and a half minutes, so it could have been a 1 minute crash that took 2 minutes with DONT_OVERCLOCK. :)
I haven't tried to run it upside down but that's definitely on my list since I saw that the vent was on the bottom :) Maybe I'll try with a fan pushing air down the vent too. I'll do some tests and report back here with the results.
I have a black one, and I don't know how long it was in stock... I think you might be right with the 2nd grade chips, and it could be that they thought this was fine to put into production, because you dont see the crashes with the original FW. And I think they stopped producing them when they realized this :)
Oh I just saw @DirtyHairy's note: Yes, I did try to switch power supplies.
I'm pretty sure too that it's hardware related.
I did some testing with the unit upside down, with a fan, without all the tv effects, phosphor, etc. and it still crashes.
I tried with DONT_OVERCLOCK (see attached pic, I hope I did this right) and I still get the same result.
Sadly, it really looks like a HW issue. These batches of r77's can't seem to be able to run this FW. It's really puzzling to me since the stock FW with Stella 3.5.2 runs without any issues, but there is probably more going on with v6.6 than I can see on screen...
Unless anyone has any other ideas to try, I think I'll be stuck with the stock FW and wait for my Harmony card, unless I can find another r77 that doesn't have this issue... but that seems unlikely/difficult as they've seemed to have stopped producing them.
First you might want to try to switch the renderer to "Software" (you have to switch to advanced settings to do so). That disables using the GPU's hardware acceleration.
If that still doesn't help, you could try one of the first community editions (some where on AtariAge) which was based on an older Stella version (3.51 IIRC). This should emulate identical with the stock version.
The very last community edition based on the old Stella was 3.9.3. So that's the one to look for.
Good idea! Stock probably doesn't OpenGLES, I will try that...
Oh I was looking for that! I thought I had seen it but couldn't find it on Github so I thought I had imagined it... I will definitely look for it as it would give me the nice UI of the CFW
@sa666666 thanks so much! I will hunt this down
Seems that the OP removed the links when the new version was released, I don't see any links available to get it now... :(
Wayback Machine to the rescue: https://www.dropbox.com/s/q2965rrzpo0jq2e/sdcard.remo.20181120-1353CB.zip?dl=0
I think that is the latest version. You have to search for the old AtariAge links, before the latest migration. http://atariage.com/forums/topic/281462-retron-77-community-build-image/
@thrust26 niiice! many thanks I had not found that one! It's probably v.3.9.3 that I was looking for.
I also found "sdcard.remo.20190119-1727.test" in the attachments of the thread, which was v3.9.4 and I got to test it during lunch and.... no crash! It ran for an hour without any issues. I will test this more extensively but so far this is great news for me :)
I will also try 6.6 with the software rendering like you suggested.
Crashes on 6.6 with software rendering and DONT_OVERCLOCK
How adventurous are you feeling? In order to drill down to the source of those crashes we'd need to run Stella (possibly built with debug symbols) from the command line and capture the output. This either requires a supported ethernet dongle and a SSH connection or a serial connection. The serial connection arguably is easier to set up, but it requires soldering a few wires to unused pads on the R77 board and a UART-to-USB dongle (a few bucks on amazon). If you want to go either way I'll be happy to assist you.
If time is not an issue, I might be adventurous enough for either option :)
I might actually have one of those ethernet dongles, just saw the info on which ones would work and how to try it. I'll give it a go this weekend...
Also, reading the documentation, it seems that DONT_OVERCLOCK is for developer mode only. I don't think I had the console in that mode so I'll also try that again.
It took 17 minutes for MS pac man to crash, but it went back to the launcher this time, with developer mode enabled. Launcher is operational afterwards... Very interesting! I've seen hundreds of hangs so far, but never did it make it back to the launcher.
That was the latest version with "Software" renderer, right?
Interesting indeed, but as I said, guesswork is not gonna take us anywhere.
@divergentdeveloper If you would be able to get a shell on the device this would be great. Ethernet and SSH are more hassle to set up, but you can use scp
to transfer files once you've got it working (i.e. to copy and run a debug build of Stella). Serial is easier to set up if you are comfortable with modifying your device, but you can't transfer files over the serial connection (easily). You can find instructions for accessing the UART here: https://github.com/stella-emu/stella/wiki/Retron-77
@thrust26 Exactly, with time machine off, all TV effects off... I could see alot of frames dropped, like the machine was struggling to keep up.
@DirtyHairy Yes, that's my next step :) I just wanted to make sure I tried the DONT_OVERCLOCK correctly and still got a crash.
My USB to UART dongle is arriving tomorrow, I'll let you know when I have a shell going. It seems the easiest option and not too above my skill level.. I'm guessing I can just swap the SD card back and forth to transfer files when needed? That's not too much trouble.
Still speculating, but that doesn't sound good. Ms. Pac Man doesn't require that much CPU performance, it should run well at 1 GHz. To me it seems like the CPU is already overheating and maybe throttling (but then it should not crash).
@DirtyHairy Do you know if the CPU is permanently running at the given frequency or if it uses DVFS?
@thrust26 It doesn't look good either, if you want to see. I made a little 3.5.4 vs 6.6 comparison video: https://www.youtube.com/watch?v=5ABeCHBX6OM
It seems to run slower as well.
@DirtyHairy Do you know if the CPU is permanently running at the given frequency or if it uses DVFS?
No, there is no governor, the CPU runs permanently at the configured speed. If it throttles then this must be the chip itself.
My USB to UART dongle is arriving tomorrow, I'll let you know when I have a shell going. It seems the easiest option and not too above my skill level.. I'm guessing I can just swap the SD card back and forth to transfer files when needed? That's not too much trouble.
Yep, that will work. Thanks a lot!
@DirtyHairy Do you know if the CPU is permanently running at the given frequency or if it uses DVFS?
No, there is no governor, the CPU runs permanently at the configured speed. If it throttles then this must be the chip itself.
That doesn't leave many options, does it? With constant frequency, the CPU is not stressed more in Stella 6.x than 3.x, hardware acceleration is not used in both, so its not the GPU too. What's left? RAM?
Got a USB2UART device that had an issue so spent most of my time today debugging that! Now that it works I still don't have a shell connection, but it's probably my bad solder using the included wires.
I'll clean up and try again tomorrow, I've got some better wires coming in as well.
Keep in mind that you need to cross RX and TX, i.e. RX goes to TX and vice versa.
Yes! That was it! Seems my soldering was fine :) I had completely forgot about this...
Many many thanks! I'm ready for the next steps then :)
Nice 😏
Now, how to proceed. When the R77 starts up it launches a dumper
process, and this process in turn spawns stella
. So, what we need to do is kill the dumper and stella. After this is done we are free to start stella ourselves on the terminal and observe its stdout
and stderr
while it runs and crashes.
First do a
# ps aux
The process list should include stella and the dumper. After that, do
# killall -9 dumper
# killall -9 stella
It is fine if the second command fails, the child process should die with dumper
anyway, I just added the second command to be 100% sure. Check the process list again; the two processes should be gone now. At this point you can launch stella manually by doing
# stella /mnt/path/to/rom
Note that the SD card is mounted on /mnt
, so /path/to/rom
refers to the path to the (Ms. Pacman) ROM on your SD card. This will start stella and launch the ROM. After stella has crashed, the first thing is to check whether Linux is still running, i.e. whether you can still type commands. If it does, please paste the output of Stella here and also paste the output of doing
# dmesg
If Linux itself has crashed, well, that's information, too 😛 Thank you again for your help.
Success!
When it crashed the first time, with software rendering, no overclock:
malloc_consolidate(): unaligned fastbin chunk detected
Aborted
I also tried having it crash with OpenGLES right after, no overclock, and got:
Segmentation fault
I've got the output of dmesg in a text file here: dmesg.txt
Thanks alot. Nothing interesting in that dmesg. The two crashes hint at memory corruption. This may be caused by either bad hardware or a bug somewhere in the stack. Let me prepare a debug build that will give a readable backtrace. In the meantime, could you retry a few more times and check how the error message fluctuates? I honestly don't think this has anything to do with software vs. hardware rendering.
Just got a new one with a lot more meat, here it is attached. I'll try to capture a few here today while working, and post them here if they are new and interesting.
I agree with the software vs hardware, it's just my habit of mentioning what config I changed in the tickets. :) I put back the original config with OpenGLES, TV effects and everything since it crashes more frequently this way. I'll probably put overclock back since I've had half-hour runs without crashing and that's not what we're looking for ;)
Another interesting one: it didn't crash, it's still running but I've just got warnings that ends with
[ 1184.929432] Fixing recursive fault but reboot is needed!
warning.txt
and it crashed shortly after :)
This third crash is almost indentical the first crash.txt crash3.txt
crash2 lists a "hard LOCKUP on cpu 0". How can that happen?
A few more:
Another very vague idea: Can you try a different, stronger power supply? How is the one you are using defined?
Or maybe @DirtyHairy already knows what is going on.
@DirtyHairy Do we know for sure that the original firmware is running at 1GHz?
@thrust26 Sure. I've tried 3 so far but I don't have logs of those crashes. I can do a test run my best power supply and see the result :) Current one is a 2.1A generic adapter.
@thrust26 Sure. I've tried 3 so far but I don't have logs of those crashes. I can do a test run my best power supply and see the result :) Current one is a 2.1A generic adapter.
Thanks, but if you already have tested multiple adapters, I am pretty sure my idea is wrong.
Thanks alot! I am afraid this is pretty conclusive, no need for running a debug build: this is either a kernel bug or faulty hardware. None of these errors in dmesg can be caused by userspace alone, and this rules out memory corruption in Stella. As only some consoles are affected I am 99% positive that hardware is the issue, probably SDRAM.
I'll try to build a version that clocks RAM at 480 MHz (instead of 624 MHz) to see whether this works any better.
@DirtyHairy Do we know for sure that the original firmware is running at 1GHz?
Yes 😏 Besides, that option does not set the clock to 1.2 GHz explicitly, but just keeps it the way it was at boot.
I'll try to build a version that clocks RAM at 480 MHz (instead of 624 MHz) to see whether this works any better.
Did you increase the RAM speed too? Else the original firmware should have similar problems, no?
Did you increase the RAM speed too? Else the original firmware should have similar problems, no?
No, I think this is bad hardware. Maybe they changed the RAM chips. It is very possible that the new firmware uses more RAM bandwidth, and maybe this exposes the issue.
@divergentdeveloper I have a version of the bootloader that reduces the DRAM clock to 480 MHz. Do you have access to a linux machine and feel confident enough to write it to the SD card with dd
(I'll give you the specifics), or should I prepare a full SD card image?
@DirtyHairy I don't have a linux machine handy, but shouldn't be a problem if I did :)
If you've got the setup to prep the SD image and it's not too much trouble, I think that'd be the easiest/fastest.. If not, I can get a VM up and running later this week and do the copy.
Unless you know of a windows app that lets you browse and write to linux FS? Just used one today for something but it's read-only, and the other one I saw was commercial and cost money.
Hi!
I've seen this reported on the atari age forum, and I can reproduce it 100% of the time on my Retron with the latest builds. The original Retron FW does not have this issue.
I've tried version 6.6, 6.52 and 6.51 and they all seem to have this problem. I've reflashed and tried all the tips I saw in the forum like disabling the time machine, etc, and I always get the same result.
It looks like it's working fine, and sometimes you can play for minutes without any issue, but every game I've tried ends up freezing.
The ROM I use for these tests is MS pac man, I just let the attract mode play, and it always freezes up. I've seen it take as long as 13 minutes to freeze, sometimes it's only one minute. I've run the same ROM on the original FW for an hour without any freeze.
Could it be that this problem has been present for a while and no one noticed it? I've done all the checks on my side and I can reproduce this 100% of the time on the brand new Retron I opened yesterday.
I bought the R77 for this project so I don't mind doing tests or providing more info to solve this issue... and maybe I missed something in my tests and it's fine :)