Data Processing Steps for 5' UTR Extraction

I am currently studying your work on the UTR-LM model, and I find your research highly insightful. I have a question regarding the data processing steps mentioned in Section A.1 of your paper in supporting information. Specifically, I noticed that the paper mentions the collection of 5' UTR sequences from the Ensembl database, followed by several cleaning and filtering steps. However, the details of how the sequences were extracted and processed from the Ensembl database to the final dataset used in your study are not fully elaborated. Could you please provide more detailed information on these steps? Thank you very much for your time and assistance. I look forward to your response.

a96123155 / UTR-LM

Data Processing Steps for 5' UTR Extraction #6