UMSATS / cdh-tsat

Contains software for the Command and Data Handling (CDH) board.
https://www.umsats.ca/
10 stars 4 forks source link

Satellite Reverse Engineering #2 #29

Open GrahamDrive opened 3 months ago

GrahamDrive commented 3 months ago

Reverse Engineer A Satellite Codebase

CDH member @whatdoes3plus1equalsto has found some open-source satellite projects that we could benefit from reverse engineering and analysis of their code bases. There are three parts to this task that I will list here.

  1. Reverse engineer the given satellite codebase: Take a deep dive into the satellite's codebase. Try to determine what type of code they have written things like "how do they control peripherals?", "Are they using a real-time operating system? If they are what type of tasks do they have and how are they organized" things like that. Also, take extra care to evaluate their error correction code like CRC checks on their memory and other corruption preventative measures.

  2. Identify modules that could improve our design: Now that you have your footing in the codebase and how it operates, try to identify some useful parts that are missing from our codebase that we could implement in our satellite. Once you find some bring it up with @GrahamDrive or @DaighB and we can discuss how it could be used.

  3. Finally some coding!: Now that you know what you want to add to the satellite you can try making a test project for a proof of concept. Go ahead and create a brand new project for your dev board and try to implement the given feature you have decided on. Have fun with it play around and add your ideas and flair.

  4. Show off your work: Now that you have completed the code and tested it you can show it off in a meeting!

University of Patras

This task will be regarding the UPSat CubeSat from the University of Patras in Greece. It is completely open source and good for it uses a stm32 microprocessor, its codebase can be found here. UpSat also has its own Wikipedia page that could have a ton of valuable information you could use the link to it is here.

As always if you have any questions don't hesitate to ask!

Koloss0 commented 3 weeks ago

I did some research on this on my own, and I thought I'd document what I've gathered thus far.

High-Level Tidbits

The On-Board-Computer (OBC)

Here are some points I've gathered about their On-Board-Computer (OBC):

Koloss0 commented 3 weeks ago

I did some research on their error handling practices. I mostly looked at the ADCS and ECSS repositories since it actually looks like the OBC repo doesn't contain any error handling at all!

Here's what I jotted down:

Handling Errors

/**
  * @brief  This function is executed in case of error occurrence.
  * @param  None
  * @retval None
  */
void Error_Handler(void)
{
  /* USER CODE BEGIN Error_Handler */
  /* User can add his own implementation to report the HAL error return state */
  while(1) 
  {
  }
  /* USER CODE END Error_Handler */ 
}
/**
   * @brief Reports the name of the source file and the source line number
   * where the assert_param error has occurred.
   * @param file: pointer to the source file name
   * @param line: assert_param error line source number
   * @retval None
   */
void assert_failed(uint8_t* file, uint32_t line)
{
  /* USER CODE BEGIN 6 */
  /* User can add his own implementation to report the file name and line number,
    ex: printf("Wrong parameters value: file %s on line %d\r\n", file, line) */
  /* USER CODE END 6 */

}

SAT_returnState

Defined here in ECSS.

SAT_returnState is an enum type for errors totalling at 57 possible error values (including SATR_OK). It's not the only enum for errors, but it appears to be the one that's used within the ECSS library.

adcs_error_status

Defined here in ADCS.

adcs_error_status is the enum type for ADCS errors. There are 10 total possible values. The errors don't appear to be particularly specific.

error_handler

Defined here in ADCS.

This gets called in the program's main loop once for each cycle of the TIM7 timer. This is accomplished by setting ADCS_event_period_status to TIMED_EVENT_NOT_SERVICED in the ISR for TIM7, and then checking every iteration of the while loop if it's set. Once it is set and the if statement is triggered, the flag gets reset at the end after error_handler is called.

int main(void) {

    // ... init code ..

    while (1) {
        /* GPS update */
        error_status = update_gps(&gps_state);

        /* Control loop runs at 68ms, interrupt runs every ~1.2s, WDG at ~2.4s */
        if (ADCS_event_period_status == TIMED_EVENT_NOT_SERVICED) {

            // ... code ...

            /* Update flag */
            ADCS_event_period_status = TIMED_EVENT_SERVICED; // reset condition to false.

            /* Software error handler runs for actuator and sensor 230ms*/
            adcs_sysview_print();

            error_handler(error_status);
            error_status = ERROR_OK;
        }
    }
}

The error handler function performs different operations based on the value of error_status.

error_propagation

Defined here in ADCS.

This function takes in a parameter current_error which is of type adcs_error_status.

In the below code sequence found in the main function, you'll see that if any one of these function calls fails, the rest will also fail with the same error code.

error_status = init_mem(); // let's say this fails with ERROR_FLASH
error_status = increment_boot_counter(); // then this will return ERROR_FLASH too (even if the operation was successful)
error_status = init_obc_communication(adcs_boot_cnt); // same as above

Note that in the second line, if increment_boot_counter fails with something else, that error code will become the new secondary error code. This actually means errors can get overwritten and lost, and they won't be rectified when error_handler eventually gets called. This is a pretty bad way to deal with errors.

Bugs, Bugs, Everywhere

There is actually a humerous bug with increment_boot_counter. There is a case where it returns FLASH_ERROR when it should return ERROR_FLASH. This is an example of why having a good naming scheme is important.

FLASH_ERROR is defined in adcs_flash.h and is NOT part of the adcs_error_status enum. See below:

typedef enum {
    FLASH_ERROR = 0, FLASH_NORMAL
} flash_status;

And of course the code still compiles because C allows you to mix enums even though that is completely dangerous to do.

Honestly, the code in UPSAT is really bad. It's full of inconsistent naming schemes and mistakes. It's hard to believe this code made it to space.

GrahamDrive commented 1 week ago

I didn't realize there was so much information in their Git repository. The FAT system is particularly intriguing. I've been considering whether we should use one or not; although I don't have experience with them in embedded systems, I assume it would make our work significantly easier though.

Thanks for this fantastic deep dive, Logan!

Koloss0 commented 1 week ago

UPSAT appears to be storing a lot of different types of data (I believe I even saw some scripts being loaded), whereas TSAT just needs to store telemetry data. So I think it's acceptable if not desirable that we don't have one.