erhard-lab / gedi

GNU General Public License v3.0
2 stars 1 forks source link

Are CSI bam indexes accepted? #8

Closed rwhetten closed 5 months ago

rwhetten commented 5 months ago

I installed Maven and Gedi following the README, and gedi -e Version returns

2024-05-27 10:24:19.978 INFO OS: Linux 5.15.0-105-generic amd64
2024-05-27 10:24:20.016 INFO Java: OpenJDK 64-Bit Server VM 11.0.22+7-post-Ubuntu-0ubuntu220.04.1
2024-05-27 10:24:20.086 INFO Discovering classes in classpath
2024-05-27 10:24:20.273 INFO Preparing simple class references
2024-05-27 10:24:20.399 INFO Gedi version 1.0.6a (JAR) startup
2024-05-27 10:24:20.400 INFO Command: gedi -e Version
2024-05-27 10:24:20.460 INFO Finished: gedi -e Version

so it seems installation worked OK. When I try to run bamlist2cit to process a list of BAM files of RNA-seq reads (total RNA and ribo-seq) aligned to the wheat genome, I get an exception Exception in thread "main" htsjdk.samtools.SAMException: No index is available for this BAM file. All BAM files have indexes created using samtools index -c, because individual chromosomes are >512 Mb. Are these indexes suitable as input to bamlist2cit? If not, is there an alternative? A workaround would be to break the chromosome sequences in half and create new alignments to the modified genome reference file, but that is not my preference.

isaacvock commented 5 months ago

I am not affilitated with the gedi project, but I can tell you that gedi's pom.xml file specifies installation of version 2.0.0 of bigwig, which for me installs version 2.16.2 of htsjdk, which is the tool that is throwing the error you mentioned (you can confirm if the same is true for you by running mvn dependency:tree from inside your cloned Gedi directory and seeing which version of htsjdk is listed under the bigwig dependency tree). Looking at the htsjdk releases, it seems that CSI support was not introduced until version 2.19.0. Thus, unless the developers can bump the version of htsjdk that gets installed when building gedi, it will not be able to support CSI indices.

Best, Isaac

isaacvock commented 5 months ago

As an update, I was able to find a work around that allowed me to create CIT files from CSI indexed bam files, but I have not tested if the workaround affects any other gedi tool. The workaround involves adding htsjdk version 2.19.0 as a dependency in the pom.xml file. I edited the pom.xml file in the cloned version of the gedi/Gedi directory to go from:

        ...
    <dependency>
        <groupId>org.broad.igv</groupId>
        <artifactId>bigwig</artifactId>
        <version>2.0.0</version>
    </dependency>
        ...

to including the additional dependency like so:

        ...
    <dependency>
        <groupId>org.broad.igv</groupId>
        <artifactId>bigwig</artifactId>
        <version>2.0.0</version>
    </dependency>
    <dependency>
            <groupId>com.github.samtools</groupId>
            <artifactId>htsjdk</artifactId>
            <version>2.19.0</version>
    </dependency>
        ...

I then rebuilt gedi. The strict placement of this dependency block is a bit arbitrary, but I just wanted to make it clear where I specifically added it.

rwhetten commented 5 months ago

@isaacvock - Thanks for your comments! I made that change, deleted the old version, and built the new version with Maven, but I still get the same error about no index for BAM file when I run the bamlist2cit command. There may be something left of the original version - I'm not familiar with Maven, so I don't know if it keeps stuff around in other places in addition to the directory where the mvn -f $INSTALL_DIR/gedi/Gedi package command is executed. I'll keep looking...

isaacvock commented 5 months ago

Interesting... I think as long as you delete the entirety of the folder named bin in the README example, you should be good (i.e., the folder you ran mvn -f ... in).

For the full context of what I did, I was able to reproduce the error, rm -r bin/, edit the pom.xml file, rebuild gedi, and then it worked. I don't append gedi to my PATH variable, so I had to edit the copy of bamlist2cit in the bin folder to include the full path to gedi in the bin folder. I then ran ./bamlist2cit mylist.bamlist to convert the bam files to CIT files.

Best of luck, I hope you can get it working!

rwhetten commented 5 months ago

After building a new version of gedi with the modified pom.xml file, I found that the lib directory contained both the htsjdk v2.16.2 and the v2.19.0 version. I moved v2.16.2 out of the lib/ directory and made a symlink with the same name that points to v2.19.0, so if bigwig calls v2.16.2, it will get v2.19.0 instead. The bamlist2cit command now runs without producing the "no index is available" error; time will tell if it can actually complete the job of producing a CIT file. Thanks again for your comments!

rwhetten commented 5 months ago

Just a confirmation that the addition of htsjdk-2.19.0.jar by editing the pom.xml file, and redirecting calls to htsjdk-igb-2.16.2.jar to the v2.19.0.jar file, worked to allow bamlist2cit to convert BAM files with CSI indexes to CIT format. Time will tell if other aspects of the Gedi package work correctly, but I'll close this for now.