caozhichongchong / arg_ranker

MIT License
27 stars 11 forks source link

Dtype warning for mothertable #13

Closed kramersu closed 1 year ago

kramersu commented 1 year ago

Hello,

thanks for the great tool. I ran arg_ranker on 65 merged metagenome fastq reads and got the following Dtype warning relating to the mother table. The output tables look correct. I am also wondering about altering the script to account for the fact that I am using merged metagenome reads. I saw in the arg_table_sum script that you introduce the multiplication factor 2 to account for paired reads in the 16S-copy variable. Do you agree I should set this back to 1 to account for read merging?

Many thanks for your help, Susanne

/home/suk000/miniconda3/envs/argranker_env/lib/python3.11/site-packages/arg_ranker/main.py:261: DtypeWarning:\ Columns (3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,\ 40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67) have mixed types. Specify dtype option on import or set low_memory=False. df = pd.read_csv(Mothertable, index_col=None, header=None, skipinitialspace=True, sep='\t')

caozhichongchong commented 1 year ago

Hi Susanne,

Thank you for reaching out!

Yes, I totally agree that the multiplication factor should be set to 1 for merged reads (ARG_table.sum.py line 55 copy_16S = float(lines_set[1])*1 # pair end).

I think this warning was caused when pandas processing a big mothertable without knowing the data types of each column. I now set the low_memory=False. You can simply ignore this warning :)

If it turns out taking a long time to read the table, please let me know!

Best regards, Anni