BertoldVdb / ms-tools

Program, library and reference designs to develop for MacroSilicon MS2106/MS2109/MS2130 chips.
MIT License
117 stars 10 forks source link

MS2131 device blinks while reading RAM #14

Open krokodilerian opened 10 months ago

krokodilerian commented 10 months ago

This could be pretty much expected, but still:

Related to #7 , due to possible changes in resolution/aspect we do have a tool based on the ms-tools (https://github.com/FOSDEM/video-ms213x-status) that would read 3 locations from memory and continuously provide the information for the resolution/signal, so we can react to changes.

The problem is that this reading of memory can lead to all kinds of blips, loses of signal, or in some extreme cases either a hang of the capture, or black screen being returned.

This is exaggerated/visible if you do a memory read in a tight loop and try to capture or even loop out. Adding sleeps between the reads hides the issue, but doesn't seem to actually fix it, as loss of signal is still observed.

Is there either a different way to get the resolution, to get the capture stream do die if the resolution changes, or something to sync to for these reads, so not to affect the device?

krokodilerian commented 10 months ago

An update: reading from different memory regions seems to help A LOT (https://github.com/FOSDEM/video-ms213x-status/commit/765752614e1c745a5c3b3004141ed93914650d1c), but we're still not completely sure it's the correct solution, there was one hang after 4-5h. We poll these locations every second.

krokodilerian commented 10 months ago

And, we have 4 boards, and tried their firmware on one another. It definitely looks like the initial firmware we had is really bad, and there's a very stable combination of the Hagibis board with its own firmware:

hagibis board: a76ac7aac8cf3a66d9378ef618b3ff39f2d8a1f3 ms2131-fosdem-20230902.bin dies immediately ecae9d8dcd0cfc67b6bed817c2874972083143db yuncun-cheap.bin 56s 7db988903e849f9787bd36e15747a90bb6ad92a7 yuncun-expensive.bin 56s 1adc7d6b1d40a53263c0d9ffe5591a0941b23479 hagibis.bin 1:06

demo board: a76ac7aac8cf3a66d9378ef618b3ff39f2d8a1f3 ms2131-fosdem-20230902.bin dies immediately ecae9d8dcd0cfc67b6bed817c2874972083143db yuncun-cheap.bin reports USB2 video? 7db988903e849f9787bd36e15747a90bb6ad92a7 yuncun-expensive.bin 19s 1adc7d6b1d40a53263c0d9ffe5591a0941b23479 hagibis.bin 4s

So I guess the issue is more software one.

BertoldVdb commented 10 months ago

It's possible the registers you are reading have some side effects, or it is interrupting another operation on the same block that must be atomic.

Does the device work completely stable when you are not touching it?

Does it stay stable when reading another memory location that is unrelated to the HDMI? For example RAM 0x0000. After that try 0x1FD0.

If yes we can try to copy the registers to a safe place in the FW main loop or vsync interrupt.

markvdb commented 10 months ago

We found the resolution in the other mem region mentioned. It's much much more stable there.

gerryd commented 10 months ago

It's much much more stable there.

More stable, but not fully so, it falls over after 3 to 4 hours. It is acceptably stable (as in: will work just fine for over 10 hours) when we don't touch it, yes. Also the - what seemed to be - copies of the data we found in areas that cause less issues when reading are not completely accurate. These regions sometimes don't get updated when we expect them to.

@BertoldVdb: your suggestion to modify the main loop in such a way as to copy the data we need to a safe place does sound very interesting! From what we've seen, I think your assessment of why it fails is correct. I don't have the faintest idea on where to start with modifying the relevant bits of the firmware, though. Would you perhaps be able to help us out with that?

In case it is relevant; a copy of our current firmware is here: https://github.com/FOSDEM/video/blob/master/hardware/edid/firmware_fosdem.bin (it is a very slightly modified version of the firmware we found on one of the devices we have, and is the one that is the most stable so far).

BertoldVdb commented 10 months ago

Hi,

Does it stay stable when reading another memory location that is unrelated to the HDMI? For example RAM 0x0000. After that try 0x1FD0.

Sincerely, Bertold

Op do 26 okt. 2023 08:16 schreef Gerry @.***>:

It's much much more stable there.

More stable, but not fully so, it falls over after 3 to 4 hours. It is acceptably stable (as in: will work just fine for over 10 hours) when we don't touch it, yes. Also the - what seemed to be - copies of the data we found in areas that cause less issues when reading is not completely accurate. These regions sometimes don't get updated when we expect them to.

@BertoldVdb https://github.com/BertoldVdb: your suggestion on modifying the main loop in such a way as to copy the data we need to a safe place does sound very interesting! From what we've seen, I think your assessment of why it fails is correct. I don't have the faintest idea on where to start with modifying the relevant bits of the firmware, though. Would you perhaps be able to help us out with that?

In case it is relevant; a copy of our current firmware is here: https://github.com/FOSDEM/video/blob/master/hardware/edid/firmware_fosdem.bin (it is a very slightly modified version of the firmware we found on one of the devices we have, and is the one that is the most stable so far).

— Reply to this email directly, view it on GitHub https://github.com/BertoldVdb/ms-tools/issues/14#issuecomment-1780478223, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABQILGPAXWPAVS65YWVCCOLYBH53RAVCNFSM6AAAAAA6JFZ27KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOBQGQ3TQMRSGM . You are receiving this because you were mentioned.Message ID: @.***>

krokodilerian commented 10 months ago

reading 4 bytes from 0x0000 every 1ms (the hard test) - survived 3h, then crashed (no video being read from the device). Will test with 0x1FD0 next.

krokodilerian commented 10 months ago

1h30m until the hang with 4 bytes read from 0x1FD0. I'm testing without any reads to make sure this is the culprit.

krokodilerian commented 10 months ago

Ok, so after a lot of testing, the issue currently seems to be caused by the USB hub ("docking station") we have the capture board connected to. I'll add some more details later (hub modes, etc).

gerryd commented 10 months ago

Does it stay stable when reading another memory location that is unrelated to the HDMI? For example RAM 0x0000. After that try 0x1FD0.

@BertoldVdb: after removing the faulty USB hub we have now ran tests on 4 setups simultaneously for about 24 hours, and can confirm that reading 4 bytes from either 0x000 or 0x1FD0 in a 1ms loop works fine.

I guess this means there would be a way to copy and read the info we need in a safe way?

BertoldVdb commented 10 months ago

This is good! Yes, this most likely means that a safe place for reading the registers can be found. I don't have MS2131 hardware though, so can't test it effectively. I can buy one, but it will take a long time to arrive. Seeing that FOSDEM is in Belgium and I am too, I was wondering if you could send a unit by post?

Probably obvious, but just to confirm, removing the USB hub did not resolve the original issue, right?

gerryd commented 10 months ago

This is good! Yes, this most likely means that a safe place for reading the registers can be found. I don't have MS2131 hardware though, so can't test it effectively. I can buy one, but it will take a long time to arrive. Seeing that FOSDEM is in Belgium and I am too, I was wondering if you could send a unit by post?

You just made my day. :)

We can definitely get one of our devices to you! I will drop you an e-mail, reply with your address, and I'll make it happen.

Probably obvious, but just to confirm, removing the USB hub did not resolve the original issue, right?

It made it better (instead of failing hard it now recovers), but didn't fix it, indeed.

BertoldVdb commented 10 months ago

Yesterday evening I made the firmware modification, I will test it over the weekend. Is this the correct test: 1080p HDMI signal in to input, output not connected to anything, read input status as fast as possible while capturing?

gerryd commented 10 months ago

Awesome! Yes, that sounds like a realistic test scenario.

Actually, we are running a test at OpenFest in Sofia this weekend. Would it be possible for us to run it here as well over the weekend? We don't mind breaking things, it's only a test run anyway.

BertoldVdb commented 10 months ago

Yes, it should be possible. I am at work now but will send the info late tonight.

BertoldVdb commented 10 months ago

Please see this pull request: https://github.com/FOSDEM/video-ms213x-status/pull/1

I have emailed the required firmware for it: sha256=fff3376a24c14780f1bcac34c1315829528c3b33923a4da37f8dabbfcf8f984e name=modified.bin

Btw: the included USB cable is super flimsy. I had to replace it.

BertoldVdb commented 10 months ago

Test stopped, was working fine for around 44hours. How did it go on your side?

markvdb commented 10 months ago

Thank you for your work. We couldn't make it fall over, so that is definitely an improvement.

We did bump into issues. The memory areas we identified are seemingly not as perfect a reflection of the resolution as we thought they were. We have some more testing to do...

BertoldVdb commented 10 months ago

What is the wrong behavior with the registers BTW?

krokodilerian commented 10 months ago

Two things come to mind, we need to test/summarize a bit more: