OpenAMP / open-amp

The main OpenAMP library implementing RPMSG, Virtio, and Remoteproc for RTOS etc
https://www.openampproject.org/
Other
678 stars 278 forks source link

remoteproc: fix management of non loadable segments #553

Closed UmairKhanUET closed 4 months ago

UmairKhanUET commented 4 months ago

The elf loader assumes that the last ELF program segment will always be a LOAD type segment. I deduce this from the fact that the elf_load() function, when loading the remote ELF sections during the RPROC_LOADER_READY_TO_LOAD stage, compares the last load segment num to the total ELF sections to determine if the loading is complete and it can move to the next stage RPROC_LOADER_POST_DATA_LOAD. If the last program segment in the ELF is not of type LOAD, the last loaded LOAD segment never equals total ELF sections. This creates an error condition and the firmware loading fails. This patch fixes this issue by comparing the last loaded LOAD segment number with the max LOAD segment number in the ELF.

This PR fixes OpenAMP#552

Signed-off-by: Muhammad Umair Khan umair_khan@mentor.com

UmairKhanUET commented 4 months ago

Can we take an exception to the UNNECESSARY_ELSE error here? Various other functions in the elf_loader.c use the same style as the one used in this PR.

arnopo commented 4 months ago

Can we take an exception to the UNNECESSARY_ELSE error here? Various other functions in the elf_loader.c use the same style as the one used in this PR.

The elf_loader.c was upstreamed before we put in place the checkpatch. That's why you can see some old stlyles occurrences. Nevertheless, for new patch we expect that the coding rules are respected when possible. So please fix it, unless you see a reason not to do it.

UmairKhanUET commented 4 months ago

Can we take an exception to the UNNECESSARY_ELSE error here? Various other functions in the elf_loader.c use the same style as the one used in this PR.

The elf_loader.c was upstreamed before we put in place the checkpatch. That's why you can see some old stlyles occurrences. Nevertheless, for new patch we expect that the coding rules are respected when possible. So please fix it, unless you see a reason not to do it.

@arnopo I have resolved the warning. Please resume your review.

arnopo commented 4 months ago

Could you provide some debug trace of your issue, that we understand what you mean by "This creates an error condition and the firmware loading fails" ? regarding the code seems that you falls into this condition : https://github.com/OpenAMP/open-amp/blob/main/lib/remoteproc/elf_loader.c#L574-L579

UmairKhanUET commented 4 months ago

Code changes might not be needed. Comments to clear up any misunderstanding on my part are welcome.

@edmooring I believe I answered your concern in the other comment thread. I am re-requesting review. Feel free to leave further comments if you have more questions.

UmairKhanUET commented 4 months ago

Could you provide some debug trace of your issue, that we understand what you mean by "This creates an error condition and the firmware loading fails" ? regarding the code seems that you falls into this condition : https://github.com/OpenAMP/open-amp/blob/main/lib/remoteproc/elf_loader.c#L574-L579

Yeah, you got it right @arnopo , it does land at the mentioned location. Technically, it shouldn't have landed here because it was already past the last LOAD segment in the previous iteration. This time now, it returns back from https://github.com/OpenAMP/open-amp/blob/main/lib/remoteproc/elf_loader.c#L578 with a valid da of the non-LOAD segment returned by the elf_next_load_segment() function, without setting the load state to RPROC_LOADER_POST_DATA_LOAD and with nmemsize = 0 (the nmemsize variable is updated to the segment length after the point of return https://github.com/OpenAMP/open-amp/blob/main/lib/remoteproc/elf_loader.c#L581 )

As the da is valid, the calling function remoteproc_load() will attempt to load it and call remoteproc_mmap(). With nmemsize equal to 0, the error condition in in remoteproc_mmap() at line https://github.com/OpenAMP/open-amp/blob/main/lib/remoteproc/remoteproc.c#L381 will hit. The pa will remain as METAL_BAD_PHYS which will make remoteproc_load() enter the error condition at https://github.com/OpenAMP/open-amp/blob/main/lib/remoteproc/remoteproc.c#L562 and hence the loading fails.

I am putting the trace log for further information:

metal: debug:     remoteproc_load: check remoteproc status
metal: debug:     remoteproc_load: open executable image
metal: debug:     remoteproc_load: check loader
metal: debug:     remoteproc_load: loading headers
metal: debug:     Loading ELF headering
metal: debug:     Loading ELF program header.
metal: debug:     Loading ELF section header.
metal: debug:     remoteproc_load, load header 0x0, 0x100, next 0xcbc98, 0x550
metal: debug:     Loading ELF section header.
metal: debug:     Loading ELF section header complete.
metal: debug:     Loading ELF shstrtab.
metal: debug:     remoteproc_load, load header 0xcbc98, 0x550, next 0xcbb25, 0x170
metal: debug:     Loading ELF shstrtab.
metal: debug:     remoteproc_load, load header 0xcbb25, 0x170, next 0xcbb25, 0x0
metal: debug:     remoteproc_load: load executable data
metal: debug:     segment: 2, total segs 4
metal: debug:     load data: da 0x60000000, offset 0xb8, len = 0x6500, memsize = 0x9b64, state 0x10802
metal: debug:     segment: 3, total segs 4
metal: debug:     load data: da 0x60009b64, offset 0x6664, len = 0x19c, memsize = 0x19c, state 0x10803
metal: debug:     cannot find more segment
metal: debug:     load data: da 0x60006194, offset 0x624c, len = 0x0, memsize = 0x0, state 0x10804
metal: error:     load failed, no mapping for 0x60006194.

With the changes under review here, the loading is successful:

metal: debug:     remoteproc_load: check remoteproc status
metal: debug:     remoteproc_load: open executable image
metal: debug:     remoteproc_load: check loader
metal: debug:     remoteproc_load: loading headers
metal: debug:     Loading ELF headering
metal: debug:     Loading ELF program header.
metal: debug:     Loading ELF section header.
metal: debug:     remoteproc_load, load header 0x0, 0x100, next 0xcbc98, 0x550
metal: debug:     Loading ELF section header.
metal: debug:     Loading ELF section header complete.
metal: debug:     Loading ELF shstrtab.
metal: debug:     remoteproc_load, load header 0xcbc98, 0x550, next 0xcbb25, 0x170
metal: debug:     Loading ELF shstrtab.
metal: debug:     remoteproc_load, load header 0xcbb25, 0x170, next 0xcbb25, 0x0
metal: debug:     remoteproc_load: load executable data
metal: debug:     segment: 2, last load seg 3
metal: debug:     load data: da 0x60000000, offset 0xb8, len = 0x6500, memsize = 0x9b64, state 0x10802
metal: debug:     segment: 3, last load seg 3
metal: debug:     load data: da 0x60009b64, offset 0x6664, len = 0x19c, memsize = 0x19c, state 0x20803
metal: debug:     load data: da 0xffffffffffffffff, offset 0x0, len = 0x0, memsize = 0x0, state 0x40803
metal: debug:     remoteproc_load, update resource table
metal: debug:     remoteproc_load: successfully load firmware
arnopo commented 4 months ago

Thanks @UmairKhanUET for the details and trace. Has @edmooring mentioned this This code is pretty old and has never been matured.

Regarding your commit, I wonder if we could not simplify this by refactoring the existing code instead of adding elf_max_load_phnum()

        nsegment = *load_state & ELF_NEXT_SEGMENT_MASK;
        phdr = elf_next_load_segment(*img_info, &nsegment, da,
                         noffset, &nsize, &nsegmsize);

        phnums = elf_phnum(*img_info);
        if (phdr) {
            *nlen = nsize;
            *nmemsize = nsegmsize;
            metal_log(METAL_LOG_DEBUG, "segment: %d, total segs %d\r\n", nsegment, phnums);
        }

        if (nsegment == phnums) {
            if (phdr) {
                *load_state = (*load_state & (~RPROC_LOADER_MASK)) |
                      RPROC_LOADER_POST_DATA_LOAD;
            } else {
                    metal_log(METAL_LOG_DEBUG, "no more segment to load\r\n");
                *load_state = (*load_state & (~RPROC_LOADER_MASK)) |
                      RPROC_LOADER_LOAD_COMPLETE;
                      *da = RPROC_LOAD_ANYADDR;
        }

Unfortunately I have no hardware to test the loader. Could you test it ( with probably few fixes) on your side and give me a feedback?

UmairKhanUET commented 4 months ago

Thanks @UmairKhanUET for the details and trace. Has @edmooring mentioned this This code is pretty old and has never been matured.

Regarding your commit, I wonder if we could not simplify this by refactoring the existing code instead of adding elf_max_load_phnum()

      nsegment = *load_state & ELF_NEXT_SEGMENT_MASK;
      phdr = elf_next_load_segment(*img_info, &nsegment, da,
                       noffset, &nsize, &nsegmsize);

      phnums = elf_phnum(*img_info);
      if (phdr) {
          *nlen = nsize;
          *nmemsize = nsegmsize;
          metal_log(METAL_LOG_DEBUG, "segment: %d, total segs %d\r\n", nsegment, phnums);
      }

      if (nsegment == phnums) {
          if (phdr) {
              *load_state = (*load_state & (~RPROC_LOADER_MASK)) |
                    RPROC_LOADER_POST_DATA_LOAD;
          } else {
                  metal_log(METAL_LOG_DEBUG, "no more segment to load\r\n");
              *load_state = (*load_state & (~RPROC_LOADER_MASK)) |
                    RPROC_LOADER_LOAD_COMPLETE;
                    *da = RPROC_LOAD_ANYADDR;
      }

Unfortunately I have no hardware to test the loader. Could you test it ( with probably few fixes) on your side and give me a feedback?

Thanks @arnopo for the feedback. I will try this out and post the results here.

UmairKhanUET commented 4 months ago

Thanks @UmairKhanUET for the details and trace. Has @edmooring mentioned this This code is pretty old and has never been matured. Regarding your commit, I wonder if we could not simplify this by refactoring the existing code instead of adding elf_max_load_phnum()

        nsegment = *load_state & ELF_NEXT_SEGMENT_MASK;
        phdr = elf_next_load_segment(*img_info, &nsegment, da,
                         noffset, &nsize, &nsegmsize);

        phnums = elf_phnum(*img_info);
        if (phdr) {
            *nlen = nsize;
            *nmemsize = nsegmsize;
            metal_log(METAL_LOG_DEBUG, "segment: %d, total segs %d\r\n", nsegment, phnums);
        }

        if (nsegment == phnums) {
            if (phdr) {
                *load_state = (*load_state & (~RPROC_LOADER_MASK)) |
                      RPROC_LOADER_POST_DATA_LOAD;
            } else {
                    metal_log(METAL_LOG_DEBUG, "no more segment to load\r\n");
                *load_state = (*load_state & (~RPROC_LOADER_MASK)) |
                      RPROC_LOADER_LOAD_COMPLETE;
                      *da = RPROC_LOAD_ANYADDR;
        }

Unfortunately I have no hardware to test the loader. Could you test it ( with probably few fixes) on your side and give me a feedback?

Thanks @arnopo for the feedback. I will try this out and post the results here.

Hi @arnopo , I tried your change and it works fine. I also stress tested it by adding two trailing non-LOAD program segments instead of one and it nicely caters this scenario as well. I have pushed the updates in this PR.

arnopo commented 4 months ago

@UmairKhanUET Before i merge it, please change the subject for something more explicit with a remoteproc: prefix For instances "remoteproc: fix management of non loadable segments"

UmairKhanUET commented 4 months ago

@UmairKhanUET Before i merge it, please change the subject for something more explicit with a remoteproc: prefix For instances "remoteproc: fix management of non loadable segments"

@arnopo Done

arnopo commented 4 months ago

@UmairKhanUET Before i merge it, please change the subject for something more explicit with a remoteproc: prefix For instances "remoteproc: fix management of non loadable segments"

@arnopo Done

unclear, I was speaking about the commit subject :-)

UmairKhanUET commented 4 months ago

@UmairKhanUET Before i merge it, please change the subject for something more explicit with a remoteproc: prefix For instances "remoteproc: fix management of non loadable segments"

@arnopo Done

unclear, I was speaking about the commit subject :-)

@arnopo I hope I did it right this time :D