GrahamDrive commented 3 months ago

Reverse Engineer A Satellite Codebase

CDH member @whatdoes3plus1equalsto has found some open-source satellite projects that we could benefit from reverse engineering and analysis of their code bases. There are three parts to this task that I will list here.

Reverse engineer the given satellite codebase: Take a deep dive into the satellite's codebase. Try to determine what type of code they have written things like "how do they control peripherals?", "Are they using a real-time operating system? If they are what type of tasks do they have and how are they organized" things like that. Also, take extra care to evaluate their error correction code like CRC checks on their memory and other corruption preventative measures.
Identify modules that could improve our design: Now that you have your footing in the codebase and how it operates, try to identify some useful parts that are missing from our codebase that we could implement in our satellite. Once you find some bring it up with @GrahamDrive or @DaighB and we can discuss how it could be used.
Finally some coding!: Now that you know what you want to add to the satellite you can try making a test project for a proof of concept. Go ahead and create a brand new project for your dev board and try to implement the given feature you have decided on. Have fun with it play around and add your ideas and flair.
Show off your work: Now that you have completed the code and tested it you can show it off in a meeting!

University of Patras

This task will be regarding the UPSat CubeSat from the University of Patras in Greece. It is completely open source and good for it uses a stm32 microprocessor, its codebase can be found here. UpSat also has its own Wikipedia page that could have a ton of valuable information you could use the link to it is here.

As always if you have any questions don't hesitate to ask!

Koloss0 commented 3 weeks ago

I did some research on this on my own, and I thought I'd document what I've gathered thus far.

High-Level Tidbits

The link Graham provided brings you to the On-Board-Computer repository, but there is also
- an ADCS repo found here,
- a Comms repo found here, and
- a Power (or as they call it, Electrical Power System) repo here
- There is another repo called ECSS Services which appears to contain lots of services and utilities that each subsystem uses here.
They are using UART to communicate between subsystems
The ADCS repo actually contains a B-Dot detumbling algorithm and a pointing algorithm using a sun-sensor here and here.

The On-Board-Computer (OBC)

Here are some points I've gathered about their On-Board-Computer (OBC):

They are using RTOS with CMSIS, and a full FAT filesystem. The filesystem goes under the name mass_storage and is defined in the ECSS Services repo.
The cubeMX/ subfolder contains two STM32 projects: disco/ and obc/. disco/ appears to be some sort of testing ground for various things. obc/ appears to be the flight-ready code.
The only file of interest is really the main.c file. Every other file appears to be part of a third-party library.
obc_data is a giant struct containing all the state of OBC. It's NOT defined in the OBC repo though. Instead it's defined in the ECSS repo here.
Similar to TSAT, UPSAT OBC is using task scheduling with RTOS to perform all of its functions. There are five separate threads/tasks:
- UART_task (function definition): This task listens for incoming messages on the UART bus. There are four different message queues: EPS, COMMS, ADCS, & DBG. These queues are checked in an infinite loop, and it appears as though OBC is just relaying the messages back to UART, possibly so that the subsystems can communicate between one another.
- HK_task (function definition): "HK" actually stands for Housekeeping (not the best naming scheme tbh) and this is another service defined in the ECSS repo. At a high level, this task simply initialises the Housekeeping service with hk_INIT defined in ECSS here, and then calls hk_SCH defined here in an infinite loop. From a surface viewing, it's really hard to tell what HK is actually doing. More research may be needed to make sense of it.
- IDLE_task (function definition: Again, it's not clear what this one is doing. More research is required.
- SU_SCH (function definition): Same as above
- sche_se_sch (function definition): Calls cross_schedules in an infinite loop, which is part of the scheduling_service service in ECSS. It's again not clear what it's doing, but it may have to do with executing "APIDs" whatever those are.
I'll probably be investigating their error handling practices next, so I'll post my findings when I do.

Koloss0 commented 3 weeks ago

I did some research on their error handling practices. I mostly looked at the ADCS and ECSS repositories since it actually looks like the OBC repo doesn't contain any error handling at all!

Here's what I jotted down:

Handling Errors

They don't do anything with the HAL Error_Handler function, which might make sense since it's not clear what the right corrective action might be if HAL throws an error. It does seem a little unfortunate that nothing is logged though.

/**
  * @brief  This function is executed in case of error occurrence.
  * @param  None
  * @retval None
  */
void Error_Handler(void)
{
  /* USER CODE BEGIN Error_Handler */
  /* User can add his own implementation to report the HAL error return state */
  while(1) 
  {
  }
  /* USER CODE END Error_Handler */ 
}

They also don't modify the assert_failed function which gets called by HAL when a parameter to a function is invalid. Again, no logging is performed for such occurences

/**
   * @brief Reports the name of the source file and the source line number
   * where the assert_param error has occurred.
   * @param file: pointer to the source file name
   * @param line: assert_param error line source number
   * @retval None
   */
void assert_failed(uint8_t* file, uint32_t line)
{
  /* USER CODE BEGIN 6 */
  /* User can add his own implementation to report the file name and line number,
    ex: printf("Wrong parameters value: file %s on line %d\r\n", file, line) */
  /* USER CODE END 6 */

}

`SAT_returnState`

Defined here in ECSS.

SAT_returnState is an enum type for errors totalling at 57 possible error values (including SATR_OK). It's not the only enum for errors, but it appears to be the one that's used within the ECSS library.

`adcs_error_status`

Defined here in ADCS.

adcs_error_status is the enum type for ADCS errors. There are 10 total possible values. The errors don't appear to be particularly specific.

`error_handler`

Defined here in ADCS.

This gets called in the program's main loop once for each cycle of the TIM7 timer. This is accomplished by setting ADCS_event_period_status to TIMED_EVENT_NOT_SERVICED in the ISR for TIM7, and then checking every iteration of the while loop if it's set. Once it is set and the if statement is triggered, the flag gets reset at the end after error_handler is called.

int main(void) {

    // ... init code ..

    while (1) {
        /* GPS update */
        error_status = update_gps(&gps_state);

        /* Control loop runs at 68ms, interrupt runs every ~1.2s, WDG at ~2.4s */
        if (ADCS_event_period_status == TIMED_EVENT_NOT_SERVICED) {

            // ... code ...

            /* Update flag */
            ADCS_event_period_status = TIMED_EVENT_SERVICED; // reset condition to false.

            /* Software error handler runs for actuator and sensor 230ms*/
            adcs_sysview_print();

            error_handler(error_status);
            error_status = ERROR_OK;
        }
    }
}

The error handler function performs different operations based on the value of error_status.

`error_propagation`

Defined here in ADCS.

This function takes in a parameter current_error which is of type adcs_error_status.

if current_error is ERROR_OK (i.e. no error is passed), then the function defaults to returning error_status (which is ERROR_OK by default at the start of the program).
otherwise, current_error is not ERROR_OK and that is returned instead. This creates a behaviour where error_status acts as a second-priority error code in case the calling function was successful.

In the below code sequence found in the main function, you'll see that if any one of these function calls fails, the rest will also fail with the same error code.

error_status = init_mem(); // let's say this fails with ERROR_FLASH
error_status = increment_boot_counter(); // then this will return ERROR_FLASH too (even if the operation was successful)
error_status = init_obc_communication(adcs_boot_cnt); // same as above

Note that in the second line, if increment_boot_counter fails with something else, that error code will become the new secondary error code. This actually means errors can get overwritten and lost, and they won't be rectified when error_handler eventually gets called. This is a pretty bad way to deal with errors.

Bugs, Bugs, Everywhere

There is actually a humerous bug with increment_boot_counter. There is a case where it returns FLASH_ERROR when it should return ERROR_FLASH. This is an example of why having a good naming scheme is important.

FLASH_ERROR is defined in adcs_flash.h and is NOT part of the adcs_error_status enum. See below:

typedef enum {
    FLASH_ERROR = 0, FLASH_NORMAL
} flash_status;

And of course the code still compiles because C allows you to mix enums even though that is completely dangerous to do.

Honestly, the code in UPSAT is really bad. It's full of inconsistent naming schemes and mistakes. It's hard to believe this code made it to space.

GrahamDrive commented 1 week ago

I didn't realize there was so much information in their Git repository. The FAT system is particularly intriguing. I've been considering whether we should use one or not; although I don't have experience with them in embedded systems, I assume it would make our work significantly easier though.

Thanks for this fantastic deep dive, Logan!

Koloss0 commented 1 week ago

UPSAT appears to be storing a lot of different types of data (I believe I even saw some scripts being loaded), whereas TSAT just needs to store telemetry data. So I think it's acceptable if not desirable that we don't have one.

UMSATS / cdh-tsat

Satellite Reverse Engineering #2 #29

Reverse Engineer A Satellite Codebase

University of Patras

High-Level Tidbits

The On-Board-Computer (OBC)

Handling Errors

`SAT_returnState`

`adcs_error_status`

`error_handler`

`error_propagation`

Bugs, Bugs, Everywhere