ARMmbed / htrun

MOVED: https://github.com/ARMmbed/mbed-os-tools (Flash, reset and run host supervised tests on mbed platforms)
8 stars 37 forks source link

Re-try flashing in case of flash failures. #159

Closed deepikabhavnani closed 7 years ago

deepikabhavnani commented 7 years ago

Code change picked upfrom 'Brain Daniels' repo: copy_and_remount_refactor

@studavekar @mazimkhan @bridadan : Please review

jupe commented 7 years ago

why it can fail in first time? we have seen several flash burned out issues but reflashin doesn't help there..

bridadan commented 7 years ago

@jupe It can fail for a number of reasons, not just flash failure. If the OS copies the binary file out of order, DAPLink will fail with an error. At this point it should be retried.

RomanSaveljev commented 7 years ago

It can fail for a number of reasons

@bridadan Could you elaborate more please? I do not think failure copying to MSD (be it provided by daplink or usb stick) is a normal case?

jupe commented 7 years ago

If the OS copies the binary file out of order

what does this mean?

what is other possible failure reasons? If reflashing is needed it sounds a bit that it might actually hide the root cause for failure which should be fixed rather. I would like to understand more about those corner cases.

bridadan commented 7 years ago

@jupe @RomanSaveljev I understand your concern, but this is not to hide a deeper failure.

From past discussions with @c1728p9, DAPLink requires that all Mass Storage operations from the host OS to happen "in order". Meaning, when the OS sends the actual USB traffic representing the binary file, the OS sends it in chunks. Normally, the OS sends these in order, starting with the first chunk (ie. "data that should go in memory address 0") and ending with the last chunk (ie "the last byte in this chunk will be the last byte written to memory"). However, per the Mass Storage specification, the OS is allowed to send the data out of order if necessary. The OS will do this occasionally. I don't know the specific reasons for this, but @c1728p9 might know.

DAPLink requires the file to be sent in order because it doesn't have enough RAM to buffer the entire binary file before writing it to the target's flash. So if the OS sends the file out of order, the correct response to this is to just retry the flashing. Since we don't have any control over how the OS copies files, retrying is really the only option when flashing via the MSD. If we want to avoid this issue, we would need to flash over the debug channel.

@c1728p9 If I got anything wrong could you please correct me? Sorry for the wall of text, it's a fairly involved case 😄

bridadan commented 7 years ago

I do not think failure copying to MSD (be it provided by daplink or usb stick) is a normal case?

True, not by usb stick. And also usually not on DAPLink. However, it does happen, and we should try to catch these cases in automation.

c1728p9 commented 7 years ago

That all looks correct @bridadan. I have investigate these failures so if you need further details on what causes them (typically a disk cache flush) and how they can be recognized, let me know. A few common error messages in FAIL.TXT that can occur due to behavior out of DAPLink's control are:

RomanSaveljev commented 7 years ago

Thanks, the RAM limitation makes sense and this is valuable information. However, it raises a question to me whether MSD flashing is a proper choice for automation farm use case. No need to debate it right now, but I think we should look more into flashing with debugger.

bridadan commented 7 years ago

*cough* pyOCD *cough* 😄

bridadan commented 7 years ago

@mazimkhan This should be ready for your review now

mazimkhan commented 7 years ago

Comments added.