BIONF / FAS

FAS - Tool for Feature Architecture Similarity calculation
https://bionf.github.io/FAS/
GNU General Public License v3.0
5 stars 1 forks source link

replace CAST #4

Closed JuRuDo closed 4 years ago

JuRuDo commented 4 years ago

Cast is a 32-bit executable it is possible that some computer environments can't run it because they lack the proper libraries: https://github.com/BIONF/HaMStR/issues/25

We should make it possible to deactivate cast for these cases

trvinh commented 4 years ago

solved with this solution https://askubuntu.com/questions/454253/how-to-run-32-bit-app-in-ubuntu-64-bit

trvinh commented 4 years ago

alternative for CAST http://biology.mcgill.ca/faculty/harrison/flps.html

trvinh commented 4 years ago

@JuRuDo , @ebersber The annoFAS function is able to run fLPS to search for low complexity regions and deliver XML outputs with the same format of other annotation tools. The change is made in the develop branch, not yet merged to master branch before everything else is done. I tested for 2 human sequences, the output looked like

<?xml version="1.0"?>
<tool name="fLPS">
    <protein id="A0A024R1R8" length="64">
        <feature type="SINGLE_{K}" instance="1">
            <start start="8"/>
            <end end="64"/>
        </feature>
        <feature type="SINGLE_{E}" instance="1">
            <start start="21"/>
            <end end="41"/>
        </feature>
        <feature type="WHOLE_{K}" instance="1">
            <start start="1"/>
            <end end="64"/>
        </feature>
    </protein>
    <protein id="A0A075B6Q4" length="140">
        <feature type="SINGLE_{D}" instance="1">
            <start start="10"/>
            <end end="38"/>
        </feature>
        <feature type="SINGLE_{E}" instance="1">
            <start start="7"/>
            <end end="131"/>
        </feature>
        <feature type="MULTIPLE_{DE}" instance="1">
            <start start="2"/>
            <end end="131"/>
        </feature>
        <feature type="WHOLE_{D}" instance="1">
            <start start="1"/>
            <end end="140"/>
        </feature>
    </protein>
</tool>

The predicted features are overlapped, and more bias residues are found by fLPS than CAST (see CAST output below)

<?xml version="1.0"?>
<tool name="CAST">
    <protein id="A0A024R161" length="153">
    </protein>
    <protein id="A0A024R1R8" length="64">
        <feature type="K-rich" instance="1">
            <start start="8"/>
            <end end="64"/>
        </feature>
    </protein>
    <protein id="A0A075B6Q4" length="140">
        <feature type="D-rich" instance="1">
            <start start="2"/>
            <end end="38"/>
        </feature>
    </protein>

FAS scores using CAST and fLPS are slightly different (0.32106 vs 0.32676), but it should/need to be tested in large scale.

trvinh commented 4 years ago

I think the bias type whole should be removed from the annotation, since it always covers the whole length of the sequence. The FAS score after removing the "whole" bias type increased from 0.32676 to 0.39214