Open reece opened 9 months ago
Hello @reece! I am Manul, from India working as a backend engineer building RESTful APIs in TypeScript, NestJS, and PostgresSQL as a database. In my current project, I am trying to implement Redis for session managemnt in my organisation. I have also contributed to python based open source projects.
I am interested to implement these various storage backends for the SeqRepo and be a part of the biocommons community. I couldn't find much info here, so could you please hint on what further steps or tasks other than porposal prep, do I need to follow to be a contributor to biocommons org? Also is there any other communication channel do I need to be part of, as I can't enter the official Slack without the domain email?
Dear @reece ,
I hope this message finds you well. I am Harsha Aditya, a third-year undergraduate student at IIT Kanpur, majoring in Bioengineering. I am excited to apply for the SeqRepo project internship opportunity and contribute to its development.
Vision for the Project: My vision for SeqRepo is to extend its capabilities by implementing an abstract interface that supports various storage backends, caching mechanisms, and federation layers. I aim to create a flexible and scalable solution that seamlessly integrates with different data sources while ensuring fast and reliable access to biological sequences. Leveraging my expertise in C++ and Python, along with my knowledge of sequence alignment algorithms, I intend to enhance SeqRepo's functionality to meet the evolving needs of bioinformatics research and clinical genetics reporting.
Existing Skills: As a Quant developer and researcher at Devine Group and WorldQuant, I have gained significant experience in Python programming and utilizing common libraries. My background in quantitative finance has honed my skills in data analysis, algorithm development, and software engineering. Additionally, my knowledge of sequence alignment software and algorithms will be instrumental in understanding the domain-specific requirements of SeqRepo and designing efficient solutions.
Skills to Learn: While I am proficient in Python, I recognize the importance of expanding my skills to include backend-specific technologies such as Redis and AWS S3 for this project. I am committed to dedicating time to self-study and practical application to acquire the necessary skills. Furthermore, I am eager to deepen my understanding of caching techniques and explore how they can be applied to optimize SeqRepo's performance.
Implementation Timeline: Based on my initial assessment, I estimate that defining and implementing the abstract interface will take approximately 50 hours. Adapting the Fastadir to use the interface and incorporating the REST interface could require around 70 hours. Implementing a local sequence cache may take 40 hours, while integrating Redis, S3, or other backends could vary depending on their complexity, requiring around 55-60 hours each.
Conclusion: I am enthusiastic about the opportunity to contribute to SeqRepo and leverage my skills to address contemporary challenges in bioinformatics. I am confident that my background in C++, Python, and bioengineering, combined with my research experience, make me well-suited for this project. I am eager to collaborate with you and the team to achieve our objectives and advance SeqRepo's capabilities.
Thank you for considering my application. I look forward to the possibility of working together on this exciting project. Pls direct me to further steps
Warm regards, Harsha Aditya
Also linking #61 to this
Hi @jsstevenson! Is there any plan to implement new backends in the project anytime soon? I would like to work on this outside GSoC. I would be happy to learn the new tech here if you could hint on some starting points?
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.
Summary
SeqRepo provides a simple interface to biological sequences and subsequences, with a single backend that provides fast random-access to local, non-redundant, compressed, and journaled sequences. The original use case for SeqRepo was to provide fast and reliable access to sequences in a clinical genetics reporting pipeline. (See design)
The goal of this issue is to create an abstract interface that supports other storage backends, as well as caching and federation layers as depicted here:
See #61 for additional information.
Community Benefits
When implemented, this project will enable the following (and ideally implement a few of them):
Expected Results / Deliverables
Required and Desired Skills
Benefits to Intern
The internship will gain software architecture and interface abstraction skills while solving a contemporary practical issue for modern bioinformatics.
How to apply
Students applying to this project should briefly describe their vision for this project, highlight their existing skills and the skills they would need to learn, and estimate an implementation timeline.