Using a prediction file with fewer genes than that in training data results in error

I did the scaden process step using a bulk RNA-seq dataset (named NewData) that has about 18,000 genes, and then ran predict using an older dataset that shares only about 15,000 genes with NewData. I got the following error.

KeyError: "Passing list-likes to .loc or [] with any missing labels is no longer supported. The following labels were missing: Index(['PANO1', 'BTNL3', 'SSSCA1', 'PNOC', 'USMG5',\n ...\n 'TMEM56-RWDD3', 'SIGLEC6', 'CCR6', 'VARS', 'CTAGE5'],\n dtype='object', length=703)

I fixed this by adding the missing genes to the old dataset, and setting zero counts for these genes across all samples. Now, I can get predict to run without errors, but I don't know if I should trust the results.

Would the proper way be to run the process-train-predict steps again with each dataset that needs to be predicted ?

KevinMenden / scaden

Using a prediction file with fewer genes than that in training data results in error #103